期刊文献+

汉语三维发音动作合成和动态模拟 被引量:1

Chinese 3D Ariticulatory Movement Synthesis and Animation
下载PDF
导出
摘要 本文以帮助聋儿言语康复为出发点,从聋儿音频发音数据中获得了聋儿易错发音文本以及聋儿易混淆发音文本对。设计了一个数据驱动的3D说话人头发音系统,该系统以EMA AG500设备采集的发音动作为驱动数据,逼真模拟了汉语的发音,从而可使聋儿观察到说话人嘴唇及舌头的运动情况,辅助聋儿发音训练,纠正易错发音。最后对系统的性能进行了人工评测,结果表明:3D说话人头发音系统可以有效地模拟说话人发音时口腔内外器官的发音动作。此外,本文还用基于音素的CM协同发音模型合成的方法,合成了聋儿易错发音文本的发音动动作,并用RMS度量了合成发音动作与真实发音动作的误差,得到了均值为1.25mm的RMS误差值。 In order to help the hearing loss children, we obtained hearing loss children’s fallible pronunciation texts and the confusing pronunciation text pairs form a good deal of hearing loss children’s audio pronunciation data. We designed a data-driven 3D talking head articulatory animation system, it was driven by the articulatory movements which were collected from a device called Electro-magnetic articulography (EMA) AG500, the system simulated Chinese articulation realistically. In that way, the hearing loss children can observe the speaker’s lips and tongue’s motions during the speaker pronouncing, which could help the hearing loss children train pronunciation and correct the fallible pronunciations. Finally, a perception test was applied to evaluate the system’s performance. The results showed that the 3D talking head system can animate both internal and external articulatory motions effectively. A modified CM model based synthesis method was used to generate the articulatory movements. The root mean square between the real articulatory movements and synthetic articulatory movements was used to measure the synthesis method, and an average value of RMS is 1.25 mm.
出处 《集成技术》 2013年第1期23-28,共6页 Journal of Integration Technology
基金 国家自然科学基金项目(NSFC61135003 NSFC90920002) 中国科学院知识创新工程项目(KJCXZ-YW-617)
关键词 聋儿易错发音文本 3D说话人头 CM协同发音模型 电磁发音动作采集仪(EMA) DIRICHLET Free-Form Deformation (DFFD)算法 hearing loss children’s fallible pronunciation texts 3D talking head CM co-articulation model electromagnetic articulography (EMA) Dirichlet Free-Form Deformation (DFFD) algorithm
  • 相关文献

参考文献15

  • 1Baker A E. The development of phonology in the blind child[J].Hearing by Eye:the Psychology of Lip Reading,1987.145-161.
  • 2Sumby W H,Pollack I. Visual contribution to speech intelligibility in noise[J].Journal of the Acoustical Society of America,1954.212-215.
  • 3Stoel-Gammon C. Prelinguistic vocalizations of hearing-impaired and normally hearing subjects[J].A Comparison of Consonantal Inventories Speech and Hearing Disorders,1988.302-315.
  • 4Mulford R. First Words of the Blind Child,the Child's Development of a Linguistic Vocabulary[M].New York:Academic Press,Inc,1988.293-338.
  • 5Benoit C,Le Goff B. Audio-visual speech synthesis from French text:eight years of models,designs and evaluation at the ICP[J].Speech Communication,1988.117-129.
  • 6Pfitzinger H. Concatenative speech synthesis with articulatory kinematics obtained via three-dimensional electro-magnetic articulography[J].Fortschritte der Akustik,2005,(02):769-770.
  • 7Heracleous P,Hagita N. Automatic recognition of speech without any audio information[A].2011.
  • 8Serrurier A,Badin P. A three-dimensional articulatory model of the velum and nasopharyngeal wall based on mri and ct data[J].Journal of the Acoustical Society of America,2008,(04):2335-2355.doi:10.1121/1.2875111.
  • 9Pierre B,Frédéric E. An audiovisual talking head for augmented speech generation:models and animations based on a real speaker's articulatory data[J].Computer Science,2008.132-143.
  • 10Cohen M,Massaro D W. Modeling coarticulation in synthetic visual speech[J].Models Technique in Computer Animation,1993.139-156.

二级参考文献10

  • 1[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139~156.
  • 2[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755~758.
  • 3[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73~76.
  • 4[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745~3748.
  • 5[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353~360.
  • 6[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619~622.
  • 7[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759~762.
  • 8[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
  • 9[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26~29.
  • 10[10]Le Goff B, Benoit C. A text-to-audiovisual-speech synthesizer for French. In: Proceedings of the 4th International Conference on Spoken Language Processing (Ⅳ). 1996. 2163~2166.

共引文献7

同被引文献7

引证文献1

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部