视觉语音参数的自动估计

Automatic Estimation of Visual Speech Parameters

下载PDF

导出

摘要视觉语音参数估计在视觉语音的研究中占有重要的地位.从MPEG4定义的人脸动画参数FAP中选择24个与发音有直接关系的参数来描述视觉语音,将统计学习方法和基于规则的方法结合起来,利用人脸颜色概率分布信息和先验形状及边缘知识跟踪嘴唇轮廓线和人脸特征点,取得了较为精确的跟踪效果.在滤除参考点跟踪中的高频噪声后,利用人脸上最为突出的4个参考点估计出主要的人脸运动姿态,从而消除了全局运动的影响,最后根据这些人脸特征点的运动计算出准确的视觉语音参数,并得到了实际应用. Visual speech parameter estimation has an important role in the study of visual speech. In this paper, 24 speech correlating parameters are selected from MPEG-4 defined facial animation parameter (FAP) to describe visual speech. Combining the statistic learning method and rule based method, precise tracking results are obtained for mouth contour and facial feature points based on facial color probability distribution and priori knowledge on shape and edge. High frequency noise in reference points tracking is eliminated by low-pass filter, and main face pose is estimated from the four most evident reference points to remove the overall movements of the face. Finally, precise visual speech parameters are computed from the movement of these facial feature points, and these parameters have already been used in some related applications.

作者王志明蔡莲红艾海舟

机构地区北京科技大学计算机科学与技术系清华大学计算机科学与技术系

出处《计算机研究与发展》 EI CSCD 北大核心 2005年第7期1185-1190,共6页 Journal of Computer Research and Development

基金高等学校博士学科点专项科研基金项目(20010003049) 北京科技大学校基金项目(20040509190) 高等学校博士学科点专项科研基金项目(20010003049)

关键词视觉语音人脸动画参数(FAP) 混合高斯模型(GMM) 变形模板 visual speech facial animation parameter (FAP) Gaussian mixture model (GMM) deformable template

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1T. Chen. Audiovisual speech processing. IEEE Signal Processing Magazine, 2001, 18(1): 9～21
2P.Y. Hong, Z. Wen, T. S. Huang . Real-time speech-driven face animation with expressions using neural networks. IEEE Trans. Neural Networks, 2002, 13(4): 916～927
3J.W. Kim, M. Song, I. J. Kim, et al. Automatic FDP/FAP generation from an image sequence. The 2000 IEEE Int'l Symposium on Circuits and Systems, ISCAS 2000, Geneva Switzerland, 2000
4N. Sarris, N. Grammalidis, M. G. Strintzis. FAP extraction using three-dimensional motion estimation. IEEE Trans. Circuits and Systems for Video Technology, 2002, 12(10): 865～876
5International standard, Information technology-Coding of audiovisual objects-Part 2: Visual; Amendment 1: Visual extensions,ISO/IEC 14496-2: 1999/Amd. 1: 2000(E)
6R. Wang, W. Gao, J. Y. Ma. An approach to robust and fast locating lip motion. The 3rd Int'l Conf. Multimodal Interfaces,Heidelberg, 2000
7A.W.C. Liew, S. H. Leung, W. H. Lau. Region-based approach to robust lip contour extraction. Electronics Letters,2000, 36(15): 1272～1274
8G. Rabi, S. W. Lu. Energy minimization for extracting mouth curves in a facial image. The Int'l Conf. Intelligent Information Systems, Bahamas, 1997

1杜平,徐大为,刘重庆.特定人脸的3D模型生成与应用的研究[J].红外与激光工程,2003,32(3):288-293. 被引量：1
2杜平,徐大为,刘重庆.特定人的三维人脸模型生成与应用[J].上海交通大学学报,2003,37(3):435-439. 被引量：4
3贾熹滨,尹宝才,孙艳丰.基于双层码本的语音驱动视觉语音合成系统[J].计算机科学,2014,41(1):100-104. 被引量：2
4蒋秀凤,蒲晓蓉,章毅.基于MPEG-4的三维人脸动画[J].电子科技大学学报,2007,36(3):569-572. 被引量：2
5李洪海.一种改进的CAMShift目标跟踪算法[J].现代电子技术,2010,33(16):106-108. 被引量：1
6张建明,陶宏,王良民,詹永照,宋顺林.基于SVD的唇动视觉语音特征提取技术[J].江苏大学学报（自然科学版）,2004,25(5):426-429. 被引量：3
7李敬华,王立春,王振,孔德慧,尹宝才.面向中国手语合成的口型与表情库构建[J].北京工业大学学报,2012,38(11):1665-1669.
8李睿,李伟娟,李明.基于加权量子粒子群的分类器设计[J].计算机工程,2010,36(7):203-204. 被引量：2
9闫茂德,徐德民,王惠刚.电机驱动机械手的鲁棒自适应跟踪控制算法[J].系统工程与电子技术,2001,23(11):54-57. 被引量：1
10李洁,廖建全.认知导师在教学中的应用研究综述[J].中国校外教育,2015(1):71-72. 被引量：1

计算机研究与发展

2005年第7期

浏览历史

内容加载中请稍等...

视觉语音参数的自动估计

参考文献8

相关作者

相关机构

相关主题

浏览历史