动态视位模型及其参数估计被引量：8

A Dynamic Viseme Model and Parameter Estimation

下载PDF

导出

摘要视觉信息可以加强人们对语音的理解,但如何在可视语音合成中生成逼真自然的口形是个复杂的问题.在深入地研究了人们说话过程中口形变化的规律后,提出了一个基于控制函数混合的动态语音视位模型.并针对汉语发音的特点给出了一种系统的从训练数据学习模型参数的方法,这比依靠主观经验人为指定模型参数更为可靠.实验结果表明,视位模型和通过训练数据学习得到的模型参数可以有效地描述汉语发音过程中口形的变化过程. Visual information can improve speech perception. But how to synthesis the realistic mouth shape is a complex problem. After studying the rule of lip movement in speaking, a dominance blending dynamic viseme model for visual speech synthesis is proposed in this paper. Furthermore, considering the characteristic of Chinese speech, a systemic learning method is given to learn the model parameters from training data, which is more reliable than desire parameters according to subjective experience. Experimental results show that the dynamic viseme model and learning method are effective.

作者王志明蔡莲红

机构地区清华大学计算机科学与技术系

出处《软件学报》 EI CSCD 北大核心 2003年第3期461-466,共6页 Journal of Software

基金 Supported by the National Research Foundation for the Doctoral Program of Higher Education of China under Grant No.20010003049 (国家教育部博士点基金)

关键词动态视位模型参数估计可视语音静态视位动态视位协同发音语音合成视觉信息 visual speech viseme static viseme dynamic viseme co-articulation

分类号 TN912.34 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献10

1[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139～156.
2[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755～758.
3[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73～76.
4[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745～3748.
5[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353～360.
6[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619～622.
7[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759～762.
8[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
9[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26～29.
10[10]Le Goff B, Benoit C. A text-to-audiovisual-speech synthesizer for French. In: Proceedings of the 4th International Conference on Spoken Language Processing (Ⅳ). 1996. 2163～2166.

同被引文献39

1曹剑芬.普通话双音子和三音子结构系统代表语料集[J].语言文字应用,1997(1):62-70. 被引量：7
2陶建华,谭铁牛.数字化人类情感——和谐人机交互环境中的情感计算[J].微电脑世界,2004(1):29-32. 被引量：12
3左力,李治国,李锦涛,高文.基于标注图像的MPEG-4人脸运动参数获取方法[J].系统仿真学报,2001,13(S2):497-501. 被引量：1
4徐向华,朱杰,郭强.汉语连续语音识别中的分级聚类算法的研究和应用[J].信号处理,2004,20(5):497-500. 被引量：2
5王志明,蔡莲红,艾海舟.基于数据驱动方法的汉语文本-可视语音合成(英文)[J].软件学报,2005,16(6):1054-1063. 被引量：16
6BOTHE H H, WIEDEN E A. A ncurofuzzy approach for modeling lips movements [C]// Proceedings of the Third IEEE Conference on Computational Intelligence.[S.l.]: IEEE Press, 1994,1 : 234-237.
7BREGLER C, COVELLM, SLANEY M. Video rewrite: driving visual speech with audio [C]// Proceedings of DIGGRAPH97. [S.I.] : ACM SIGGRAPH, 1997 : 353-360.
8Cosatto E, Ostermann J, Graf H P, et al. Lifelike talking faces for interactive services [J].Proceedings of the IEEE, 2003, 91(9) : 1406 - 1429.
9TANG Hao, FU Yun, TU Jilin, et al. Humanoid audio-visual avatar with emotive text-to-speech synthesis [J]. IEEE Transactions on Multimedia, 2008, 10(6) : 969 -981.
10WU Zhiyong, ZHANG Shen, CAI Lianhong, et al. Real-time Synthesis of Chinese Visual Speech and Facial Expressions using MPEG-4 FAP Features in a Three-dimensional Avatar [C]//The International Conference on Spoken Language Processing, Pittsburgh, 2006 : 1802-1805.

引证文献8

1李爱军,张利刚,李洋,孟昭鹏,王霞.汉语口语对话中姿态与语音信息关系初探[J].清华大学学报（自然科学版）,2008,48(S1):627-634.
2郑红娜,朱云,王岚,陈辉.汉语三维发音动作合成和动态模拟[J].集成技术,2013,2(1):23-28. 被引量：1
3王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152. 被引量：5
4张小凤,杨卫英,蔡方方,田超.汉语复韵母的三维动态视位模型[J].电声技术,2009,33(12):54-57. 被引量：3
5李冰锋,谢磊,周祥增,付中华,张艳宁.实时语音驱动的虚拟说话人[J].清华大学学报（自然科学版）,2011,51(9):1180-1186. 被引量：2
6李皓,陈艳艳,唐朝京.唇部子运动与权重函数表征的汉语动态视位[J].信号处理,2012,28(3):322-328. 被引量：12
7曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
8刘学杰,赵晖.改进参数控制的可视语音合成方法[J].计算机工程与设计,2017,38(4):989-995.

二级引证文献23

1邵艳秋,穗志方,韩纪庆,王志伟.小规模情感数据和大规模中性数据相结合的情感韵律建模研究[J].计算机研究与发展,2007,44(9):1624-1631.
2张小燕,宿建军,薛化建,王磊.维吾尔语语音识别语料库中的OOV研究[J].计算机工程与设计,2012,33(2):772-776. 被引量：4
3李冰锋,谢磊,朱鹏程,樊博.语音驱动虚拟说话人的自然头动生成[J].清华大学学报（自然科学版）,2013,53(6):898-902.
4曾洪鑫,胡东波,胡志刚.文本与朗读语音共同驱动的汉语语音与口型匹配方案[J].计算机与现代化,2013(10):135-137. 被引量：1
5曾洪鑫,胡东波,胡志刚.浅析汉语语音与口型匹配的基本机理[J].电声技术,2013,37(10):44-48.
6於俊,汪增福.面向人机接口的多种输入驱动的三维虚拟人头[J].计算机学报,2013,36(12):2525-2536. 被引量：2
7贾熹滨,尹宝才,孙艳丰.基于双层码本的语音驱动视觉语音合成系统[J].计算机科学,2014,41(1):100-104. 被引量：2
8赵勇,蒋冬梅,Sahli Hichem.基于状态异步DBN的语音驱动面部动画合成[J].计算机工程,2014,40(2):180-183. 被引量：1
9吴翠娟,赵晖.可视化协同发音合成研究综述[J].现代计算机,2014,20(9):9-14.
10曾洪鑫,胡东波,胡志刚.双模态驱动的汉语语音与口型匹配控制模型[J].计算机工程与应用,2015,51(3):202-207. 被引量：1

1张小凤,杨卫英,蔡方方,田超.汉语复韵母的三维动态视位模型[J].电声技术,2009,33(12):54-57. 被引量：3
2张瑞琦,王雅弘,黄立维,许伯仲.WiMAX网络下动态语音与影像串流服务之策略应用[J].电子与电脑,2009,9(4):78-82.
3毕沧起,汪杰.家居影音体验——10套家庭影院系统搭配组合方案[J].数字生活,2008,0(11):52-71.
4编辑部.家庭影院系统搭配解决方案[J].家庭影院技术,2008(9):23-42.
5刘颖,王成儒.用于人脸动画的语音特征提取算法研究[J].电声技术,2008,32(12):49-53. 被引量：2
6等离子电视机选购要点[J].科学时代,2005(03S):62-63.
7潘永波,戴义保,谢春建.Windows2000/NT下串口通信实时问题的解决[J].仪器仪表用户,2004,11(5):59-61.
8林栋.ASON组网中的保护恢复机制[J].电信网技术,2006(3):31-34. 被引量：2
9钟晓,周昌乐,俞瑞钊.一种面向汉语语音识别的口形形状识别方法[J].软件学报,1999,10(2):205-209. 被引量：6
10翟旭平,杨兵兵,孟田.基于PCA和混合核函数QPSO_SVM频谱感知算法[J].电子测量技术,2016,39(9):87-90. 被引量：13

软件学报

2003年第3期

浏览历史

内容加载中请稍等...

动态视位模型及其参数估计被引量：8

参考文献10

同被引文献39

引证文献8

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

动态视位模型及其参数估计 被引量：8

参考文献10

同被引文献39

引证文献8

二级引证文献23

相关作者

相关机构

相关主题

浏览历史

动态视位模型及其参数估计被引量：8