期刊文献+

视觉语音参数的自动估计

Automatic Estimation of Visual Speech Parameters
下载PDF
导出
摘要 视觉语音参数估计在视觉语音的研究中占有重要的地位.从MPEG4定义的人脸动画参数FAP中选择24个与发音有直接关系的参数来描述视觉语音,将统计学习方法和基于规则的方法结合起来,利用人脸颜色概率分布信息和先验形状及边缘知识跟踪嘴唇轮廓线和人脸特征点,取得了较为精确的跟踪效果.在滤除参考点跟踪中的高频噪声后,利用人脸上最为突出的4个参考点估计出主要的人脸运动姿态,从而消除了全局运动的影响,最后根据这些人脸特征点的运动计算出准确的视觉语音参数,并得到了实际应用. Visual speech parameter estimation has an important role in the study of visual speech. In this paper, 24 speech correlating parameters are selected from MPEG-4 defined facial animation parameter (FAP) to describe visual speech. Combining the statistic learning method and rule based method, precise tracking results are obtained for mouth contour and facial feature points based on facial color probability distribution and priori knowledge on shape and edge. High frequency noise in reference points tracking is eliminated by low-pass filter, and main face pose is estimated from the four most evident reference points to remove the overall movements of the face. Finally, precise visual speech parameters are computed from the movement of these facial feature points, and these parameters have already been used in some related applications.
出处 《计算机研究与发展》 EI CSCD 北大核心 2005年第7期1185-1190,共6页 Journal of Computer Research and Development
基金 高等学校博士学科点专项科研基金项目(20010003049) 北京科技大学校基金项目(20040509190) 高等学校博士学科点专项科研基金项目(20010003049)
关键词 视觉语音 人脸动画参数(FAP) 混合高斯模型(GMM) 变形模板 visual speech facial animation parameter (FAP) Gaussian mixture model (GMM) deformable template
  • 相关文献

参考文献8

  • 1T. Chen. Audiovisual speech processing. IEEE Signal Processing Magazine, 2001, 18(1): 9~21
  • 2P.Y. Hong, Z. Wen, T. S. Huang . Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks, 2002, 13(4): 916~927
  • 3J.W. Kim, M. Song, I. J. Kim, et al. Automatic FDP/FAP generation from an image sequence. The 2000 IEEE Int'l Symposium on Circuits and Systems, ISCAS 2000, Geneva Switzerland, 2000
  • 4N. Sarris, N. Grammalidis, M. G. Strintzis. FAP extraction using three-dimensional motion estimation. IEEE Trans. Circuits and Systems for Video Technology, 2002, 12(10): 865~876
  • 5International standard, Information technology-Coding of audiovisual objects-Part 2: Visual; Amendment 1: Visual extensions,ISO/IEC 14496-2: 1999/Amd. 1: 2000(E)
  • 6R. Wang, W. Gao, J. Y. Ma. An approach to robust and fast locating lip motion. The 3rd Int'l Conf. Multimodal Interfaces,Heidelberg, 2000
  • 7A.W.C. Liew, S. H. Leung, W. H. Lau. Region-based approach to robust lip contour extraction. Electronics Letters,2000, 36(15): 1272~1274
  • 8G. Rabi, S. W. Lu. Energy minimization for extracting mouth curves in a facial image. The Int'l Conf. Intelligent Information Systems, Bahamas, 1997

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部