文本-视觉语音合成综述被引量：5

A Review of Text-to-Visual Speech Synthesis

下载PDF

导出

摘要视觉信息对于理解语音的内容非常重要·不只是听力有障碍的人,普通人在交谈过程中也存在着一定程度的唇读,尤其是在语音质量受损的噪声环境下·正如文语转换系统可以使计算机像人一样讲话,文本-视觉语音合成系统可以使计算机模拟人类语音的双模态性,让计算机界面变得更为友好·回顾了文本-视觉语音合成的发展·文本驱动的视觉语音合成的实现方法可以分为两类:基于参数控制的方法和基于数据驱动的方法·详细介绍了参数控制类中的几个关键问题和数据驱动类中的几种不同实现方法,比较了这两类方法的优缺点及不同的适用环境· Visual information is important to the understanding of speech. Not only hearing-impaired people, but people with normal hearing also make use of visual information that accompanies speech,especially when the acoustic speech is degraded in the noise environment. As text-to-speech （TTS） synthesis makes computer speak like human, text-to-visual speech （TTVS） synthesis by computer face animation can incorporate bimodality of speech into human-computer interaction interface in order to make it friendly. The state-of-the-art of text-to-visual speech synthesis research is reviewed. Two classes of approaches, parameter control approach and data driven approach, are developed in visual speech synthesis.For the parameter control approach, three key problems are discussed： face model construction, animation control parameters definition, and the dynamic properties of control parameters. For the data driven approach, three main methods are introduced： video slice concatenation, key frame morphing, and face components combination. Finally, the advantages and disadvantages of each approach are discussed.

作者王志明陶建华

机构地区北京科技大学计算机科学与技术系中国科学院自动化研究所模式识别国家重点实验室

出处《计算机研究与发展》 EI CSCD 北大核心 2006年第1期145-152,共8页 Journal of Computer Research and Development

基金北京科技大学校科研基金项目(20040509190) 中国科学院自动化研究所模式识别国家重点实验室开放课题基金项目

关键词文本-视觉语音合成(TTVS) 视位协同发音人脸模型人脸动画 text-to-visual speech （TTVS） viseme co-articulation face model facial animation

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献2

1左力,李治国,李锦涛,高文.基于标注图像的MPEG-4人脸运动参数获取方法[J].系统仿真学报,2001,13(S2):497-501. 被引量：1
2王志明,蔡莲红.动态视位模型及其参数估计[J].软件学报,2003,14(3):461-466. 被引量：8

二级参考文献13

1[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139～156.
2[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755～758.
3[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73～76.
4[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745～3748.
5[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353～360.
6[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619～622.
7[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759～762.
8[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
9[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26～29.
10[10]Le Goff B, Benoit C. A text-to-audiovisual-speech synthesizer for French. In: Proceedings of the 4th International Conference on Spoken Language Processing (Ⅳ). 1996. 2163～2166.

共引文献7

1李爱军,张利刚,李洋,孟昭鹏,王霞.汉语口语对话中姿态与语音信息关系初探[J].清华大学学报（自然科学版）,2008,48(S1):627-634.
2郑红娜,朱云,王岚,陈辉.汉语三维发音动作合成和动态模拟[J].集成技术,2013,2(1):23-28. 被引量：1
3张小凤,杨卫英,蔡方方,田超.汉语复韵母的三维动态视位模型[J].电声技术,2009,33(12):54-57. 被引量：3
4李冰锋,谢磊,周祥增,付中华,张艳宁.实时语音驱动的虚拟说话人[J].清华大学学报（自然科学版）,2011,51(9):1180-1186. 被引量：2
5李皓,陈艳艳,唐朝京.唇部子运动与权重函数表征的汉语动态视位[J].信号处理,2012,28(3):322-328. 被引量：12
6曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
7刘学杰,赵晖.改进参数控制的可视语音合成方法[J].计算机工程与设计,2017,38(4):989-995.

同被引文献52

1张皖志,陶建华.基于声韵母基元的嵌入式中文语音合成系统[J].信号处理,2005,21(z1):216-219. 被引量：1
2王志明,蔡莲红,艾海舟.基于数据驱动方法的汉语文本-可视语音合成(英文)[J].软件学报,2005,16(6):1054-1063. 被引量：16
3贾熹滨,尹宝才,李敬华.语音同步的可视语音合成技术研究[J].北京工业大学学报,2005,31(6):656-661. 被引量：5
4蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报（自然科学版）,2006,46(1):86-89. 被引量：38
5M Schroder.Emotional speech synthesis:A review[C].The 7th European Conf on Speech Communication and Technology Eurospeech 2001,Aalborg,2001
6J E Cahn.Generating expression in synthesized speech:[Master dissertation][D].Cambridge,USA:Massachusetts Institute of Technology,1989
7I R Murray,J L Arnott.Implementation and testing of a system for producing emotion-by-rule in synthetic speech[J].Speech Communication,1995,16(4):369-390
8I Iriondo,R Guaus,A Rodriguez.Validation of an acoustical modelling of emotional expression in Spanish using speech synthesis techniques[C].ISCA Workshop (ITRW) on Speech and Emotion,Newcastle,Northern Ireland,2000
9J M Montero,J Gutierrez-Arriola,J Colas,et al.Development of emotional speech synthesiser in Spanish[C].The 6th European Conf on Speech Communication and Technology 1999,Budapest,Hungary,1999
10A Iida,N Campbell,F Higuchi,et al.A corpus-based speech synthesis system with emotion[J].Speech Communication,2003,40(1-2):161-187

引证文献5

1邵艳秋,穗志方,韩纪庆,王志伟.小规模情感数据和大规模中性数据相结合的情感韵律建模研究[J].计算机研究与发展,2007,44(9):1624-1631.
2张小燕,宿建军,薛化建,王磊.维吾尔语语音识别语料库中的OOV研究[J].计算机工程与设计,2012,33(2):772-776. 被引量：4
3於俊,汪增福.面向人机接口的多种输入驱动的三维虚拟人头[J].计算机学报,2013,36(12):2525-2536. 被引量：2
4贾熹滨,尹宝才,孙艳丰.基于双层码本的语音驱动视觉语音合成系统[J].计算机科学,2014,41(1):100-104. 被引量：2
5吴翠娟,赵晖.可视化协同发音合成研究综述[J].现代计算机,2014,20(9):9-14.

二级引证文献8

1努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木,热依曼.吐尔逊.基于音节的维吾尔语大词汇连续语音识别系统[J].清华大学学报（自然科学版）,2013,53(6):741-744. 被引量：5
2努尔麦麦提.尤鲁瓦斯,吾守尔.斯拉木,热依曼.吐尔逊.维吾尔语大词汇语音识别系统识别单元研究[J].北京大学学报（自然科学版）,2014,50(1):149-152. 被引量：4
3刘豫军,夏聪.计算机语音合成技术研究及发展方向[J].网络安全技术与应用,2014(12):22-22. 被引量：4
4黄忠,胡敏,王晓华.一种基于几何特征的表情相似性度量方法[J].模式识别与人工智能,2015,28(5):443-451. 被引量：5
5王慧慧,赵晖.语音驱动人脸动画研究综述[J].现代计算机（中旬刊）,2015(5):54-59. 被引量：2
6艾斯卡尔·肉孜,殷实,张之勇,王东,艾斯卡尔·艾木都拉,郑方.THUYG-20：免费的维吾尔语语音数据库[J].清华大学学报（自然科学版）,2017,57(2):182-187. 被引量：13
7刘宁,梁美霞,陈丽锦.海丝艺术公园三维虚拟景观场景再现[J].河北北方学院学报（自然科学版）,2017,33(3):37-46. 被引量：1
8穆凯代姆罕·伊敏江,艾斯卡尔·艾木都拉,米吉提·阿不里米提.基于CNN-HMM和RNN的维吾尔语语音识别[J].现代电子技术,2021,44(11):172-176. 被引量：3

1王志明,陶建华.计算机应用——文本-视觉语音合成综述[J].中国学术期刊文摘,2006,12(8):5-5.
2吴翠娟,赵晖.可视化协同发音合成研究综述[J].现代计算机,2014,20(9):9-14.
3贾熹滨,尹宝才,孙艳丰.基于双层码本的语音驱动视觉语音合成系统[J].计算机科学,2014,41(1):100-104. 被引量：2
4林凡.基于文本驱动的人脸表情动画系统[J].计算机光盘软件与应用,2012,15(15):191-192.
5赖伟,孙岭,王仁华.一种基于三维模型和照片的合成“说话头”[J].中国图象图形学报（A辑）,2004,9(7):886-892. 被引量：3
6晏洁,宋益波,高文.文本驱动的面部表情合成系统的设计与实现[J].计算机工程与科学,1997,19(4):14-18.
7刘学杰,赵晖.改进参数控制的可视语音合成方法[J].计算机工程与设计,2017,38(4):989-995.
8康广玉,郭世泽,孙圣和.基于共振峰过渡的协同发音语音合成算法[J].天津大学学报,2010,43(9):810-814. 被引量：2
9宋益波,高文,尹宝才,刘颖,晏洁,徐琳,陈海涛.文本驱动的聋哑人手语合成系统[J].计算机学报,1999,22(7):733-739. 被引量：7
10张鹏,王丽红,毛琳.语音合成系统中波形拼接过渡算法的研究[J].黑龙江大学自然科学学报,2011,28(6):867-870. 被引量：1

计算机研究与发展

2006年第1期

浏览历史

内容加载中请稍等...

文本-视觉语音合成综述被引量：5

参考文献2

二级参考文献13

共引文献7

同被引文献52

引证文献5

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

文本-视觉语音合成综述 被引量：5

参考文献2

二级参考文献13

共引文献7

同被引文献52

引证文献5

二级引证文献8

相关作者

相关机构

相关主题

浏览历史

文本-视觉语音合成综述被引量：5