期刊文献+

文本-视觉语音合成综述 被引量:5

A Review of Text-to-Visual Speech Synthesis
下载PDF
导出
摘要 视觉信息对于理解语音的内容非常重要·不只是听力有障碍的人,普通人在交谈过程中也存在着一定程度的唇读,尤其是在语音质量受损的噪声环境下·正如文语转换系统可以使计算机像人一样讲话,文本-视觉语音合成系统可以使计算机模拟人类语音的双模态性,让计算机界面变得更为友好·回顾了文本-视觉语音合成的发展·文本驱动的视觉语音合成的实现方法可以分为两类:基于参数控制的方法和基于数据驱动的方法·详细介绍了参数控制类中的几个关键问题和数据驱动类中的几种不同实现方法,比较了这两类方法的优缺点及不同的适用环境· Visual information is important to the understanding of speech. Not only hearing-impaired people, but people with normal hearing also make use of visual information that accompanies speech,especially when the acoustic speech is degraded in the noise environment. As text-to-speech (TTS) synthesis makes computer speak like human, text-to-visual speech (TTVS) synthesis by computer face animation can incorporate bimodality of speech into human-computer interaction interface in order to make it friendly. The state-of-the-art of text-to-visual speech synthesis research is reviewed. Two classes of approaches, parameter control approach and data driven approach, are developed in visual speech synthesis.For the parameter control approach, three key problems are discussed: face model construction, animation control parameters definition, and the dynamic properties of control parameters. For the data driven approach, three main methods are introduced: video slice concatenation, key frame morphing, and face components combination. Finally, the advantages and disadvantages of each approach are discussed.
出处 《计算机研究与发展》 EI CSCD 北大核心 2006年第1期145-152,共8页 Journal of Computer Research and Development
基金 北京科技大学校科研基金项目(20040509190) 中国科学院自动化研究所模式识别国家重点实验室开放课题基金项目
关键词 文本-视觉语音合成(TTVS) 视位 协同发音 人脸模型 人脸动画 text-to-visual speech (TTVS) viseme co-articulation face model facial animation
  • 相关文献

参考文献2

二级参考文献13

  • 1[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139~156.
  • 2[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755~758.
  • 3[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73~76.
  • 4[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745~3748.
  • 5[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353~360.
  • 6[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619~622.
  • 7[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759~762.
  • 8[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
  • 9[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26~29.
  • 10[10]Le Goff B, Benoit C. A text-to-audiovisual-speech synthesizer for French. In: Proceedings of the 4th International Conference on Spoken Language Processing (Ⅳ). 1996. 2163~2166.

共引文献7

同被引文献52

  • 1张皖志,陶建华.基于声韵母基元的嵌入式中文语音合成系统[J].信号处理,2005,21(z1):216-219. 被引量:1
  • 2王志明,蔡莲红,艾海舟.基于数据驱动方法的汉语文本-可视语音合成(英文)[J].软件学报,2005,16(6):1054-1063. 被引量:16
  • 3贾熹滨,尹宝才,李敬华.语音同步的可视语音合成技术研究[J].北京工业大学学报,2005,31(6):656-661. 被引量:5
  • 4蒋丹宁,蔡莲红.基于语音声学特征的情感信息识别[J].清华大学学报(自然科学版),2006,46(1):86-89. 被引量:38
  • 5M Schroder.Emotional speech synthesis:A review[C].The 7th European Conf on Speech Communication and Technology Eurospeech 2001,Aalborg,2001
  • 6J E Cahn.Generating expression in synthesized speech:[Master dissertation][D].Cambridge,USA:Massachusetts Institute of Technology,1989
  • 7I R Murray,J L Arnott.Implementation and testing of a system for producing emotion-by-rule in synthetic speech[J].Speech Communication,1995,16(4):369-390
  • 8I Iriondo,R Guaus,A Rodriguez.Validation of an acoustical modelling of emotional expression in Spanish using speech synthesis techniques[C].ISCA Workshop (ITRW) on Speech and Emotion,Newcastle,Northern Ireland,2000
  • 9J M Montero,J Gutierrez-Arriola,J Colas,et al.Development of emotional speech synthesiser in Spanish[C].The 6th European Conf on Speech Communication and Technology 1999,Budapest,Hungary,1999
  • 10A Iida,N Campbell,F Higuchi,et al.A corpus-based speech synthesis system with emotion[J].Speech Communication,2003,40(1-2):161-187

引证文献5

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部