汉语三维发音动作合成和动态模拟被引量：1

Chinese 3D Ariticulatory Movement Synthesis and Animation

下载PDF

导出

摘要本文以帮助聋儿言语康复为出发点,从聋儿音频发音数据中获得了聋儿易错发音文本以及聋儿易混淆发音文本对。设计了一个数据驱动的3D说话人头发音系统,该系统以EMA AG500设备采集的发音动作为驱动数据,逼真模拟了汉语的发音,从而可使聋儿观察到说话人嘴唇及舌头的运动情况,辅助聋儿发音训练,纠正易错发音。最后对系统的性能进行了人工评测,结果表明:3D说话人头发音系统可以有效地模拟说话人发音时口腔内外器官的发音动作。此外,本文还用基于音素的CM协同发音模型合成的方法,合成了聋儿易错发音文本的发音动动作,并用RMS度量了合成发音动作与真实发音动作的误差,得到了均值为1.25mm的RMS误差值。 In order to help the hearing loss children, we obtained hearing loss children’s fallible pronunciation texts and the confusing pronunciation text pairs form a good deal of hearing loss children’s audio pronunciation data. We designed a data-driven 3D talking head articulatory animation system, it was driven by the articulatory movements which were collected from a device called Electro-magnetic articulography (EMA) AG500, the system simulated Chinese articulation realistically. In that way, the hearing loss children can observe the speaker’s lips and tongue’s motions during the speaker pronouncing, which could help the hearing loss children train pronunciation and correct the fallible pronunciations. Finally, a perception test was applied to evaluate the system’s performance. The results showed that the 3D talking head system can animate both internal and external articulatory motions effectively. A modified CM model based synthesis method was used to generate the articulatory movements. The root mean square between the real articulatory movements and synthetic articulatory movements was used to measure the synthesis method, and an average value of RMS is 1.25 mm.

作者郑红娜朱云王岚陈辉

机构地区中国科学院深圳先进技术研究院集成所环绕智能实验室太原理工大学信息工程学院中国科学院软件研究所

出处《集成技术》 2013年第1期23-28,共6页 Journal of Integration Technology

基金国家自然科学基金项目(NSFC61135003 NSFC90920002) 中国科学院知识创新工程项目(KJCXZ-YW-617)

关键词聋儿易错发音文本 3D说话人头 CM协同发音模型电磁发音动作采集仪(EMA) DIRICHLET Free-Form Deformation (DFFD)算法 hearing loss children’s fallible pronunciation texts 3D talking head CM co-articulation model electromagnetic articulography (EMA) Dirichlet Free-Form Deformation (DFFD) algorithm

分类号 TN912.3 [电子电信—通信与信息系统]

引文网络
相关文献

参考文献15

1Baker A E. The development of phonology in the blind child[J].Hearing by Eye:the Psychology of Lip Reading,1987.145-161.
2Sumby W H,Pollack I. Visual contribution to speech intelligibility in noise[J].Journal of the Acoustical Society of America,1954.212-215.
3Stoel-Gammon C. Prelinguistic vocalizations of hearing-impaired and normally hearing subjects[J].A Comparison of Consonantal Inventories Speech and Hearing Disorders,1988.302-315.
4Mulford R. First Words of the Blind Child,the Child's Development of a Linguistic Vocabulary[M].New York:Academic Press,Inc,1988.293-338.
5Benoit C,Le Goff B. Audio-visual speech synthesis from French text:eight years of models,designs and evaluation at the ICP[J].Speech Communication,1988.117-129.
6Pfitzinger H. Concatenative speech synthesis with articulatory kinematics obtained via three-dimensional electro-magnetic articulography[J].Fortschritte der Akustik,2005,(02):769-770.
7Heracleous P,Hagita N. Automatic recognition of speech without any audio information[A].2011.
8Serrurier A,Badin P. A three-dimensional articulatory model of the velum and nasopharyngeal wall based on mri and ct data[J].Journal of the Acoustical Society of America,2008,(04):2335-2355.doi:10.1121/1.2875111.
9Pierre B,Frédéric E. An audiovisual talking head for augmented speech generation:models and animations based on a real speaker's articulatory data[J].Computer Science,2008.132-143.
10Cohen M,Massaro D W. Modeling coarticulation in synthetic visual speech[J].Models Technique in Computer Animation,1993.139-156.

二级参考文献10

1[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139～156.
2[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755～758.
3[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73～76.
4[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745～3748.
5[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353～360.
6[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619～622.
7[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759～762.
8[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
9[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26～29.
10[10]Le Goff B, Benoit C. A text-to-audiovisual-speech synthesizer for French. In: Proceedings of the 4th International Conference on Spoken Language Processing (Ⅳ). 1996. 2163～2166.

共引文献7

1李爱军,张利刚,李洋,孟昭鹏,王霞.汉语口语对话中姿态与语音信息关系初探[J].清华大学学报（自然科学版）,2008,48(S1):627-634.
2王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152. 被引量：5
3张小凤,杨卫英,蔡方方,田超.汉语复韵母的三维动态视位模型[J].电声技术,2009,33(12):54-57. 被引量：3
4李冰锋,谢磊,周祥增,付中华,张艳宁.实时语音驱动的虚拟说话人[J].清华大学学报（自然科学版）,2011,51(9):1180-1186. 被引量：2
5李皓,陈艳艳,唐朝京.唇部子运动与权重函数表征的汉语动态视位[J].信号处理,2012,28(3):322-328. 被引量：12
6曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
7刘学杰,赵晖.改进参数控制的可视语音合成方法[J].计算机工程与设计,2017,38(4):989-995.

同被引文献7

1顾曰国.教育生态学模型与网络教育[J].外语电化教学,2005(4):3-8. 被引量：90
2顾曰国.多媒体、多模态学习剖析[J].外语电化教学,2007(2):3-12. 被引量：1375
3陈桦,毕冉.英语专业学生朗读任务中语音能力的发展模式研究[J].解放军外国语学院学报,2008,31(4):43-49. 被引量：39
4郑晓杰,郑鲜日.中国大学生英语双元音时长特征分析[J].外语与外语教学,2012(4):40-42. 被引量：9
5陈莹.第二语言语音感知研究的理论基础和教学意义[J].外国语,2013,36(3):68-76. 被引量：21
6纪晓丽,张辉,李爱军,龚箭.不同水平学习者对英语语调感知的实证研究[J].外语教学与研究,2018,50(3):393-406. 被引量：11
7贾媛,李爱军,郑秋豫.中国方言区英语学习者语音库构建[J].中国语音学报,2013(1):38-45. 被引量：4

引证文献1

1智娜,李爱军.基于可视化发音模型的语音训练研究[J].外国语,2020,43(1):59-74. 被引量：11

二级引证文献11

1智娜.基于可视化方法的英语语调教学[J].中国语音学报,2023(2):64-72.
2丁红卫.智能型计算机辅助外语语音教学的现状与趋势[J].中国外语,2021,18(3):58-64. 被引量：7
3冯欣.两种可视化教学方式对辅音习得的影响研究[J].海外英语,2021(17):256-257.
4王文敏,李胜辉.可视语音合成技术的3D通信关键技术研究[J].单片机与嵌入式系统应用,2021,21(11):42-46.
5陈莹.语音产出的实验研究方法[J].外语研究,2021,38(5):6-11. 被引量：1
6孔江平.生理语音学研究的理论与方法[J].语言文字应用,2021(4):2-14. 被引量：4
7舒婧娟,向彪.论英语音标系统的认知同化及其实现路径[J].甘肃教育研究,2022(2):33-37.
8韩小艳.第二语言习得语音训练综述[J].英语教师,2022,22(23):154-156.
9张昕煜.基于语音感知的英语口语发音自动校准系统[J].自动化技术与应用,2023,42(5):44-47. 被引量：1
10吴亚亚.基于传感器技术的三维发音可视化合成系统研究设计[J].自动化与仪器仪表,2024(4):102-105.

1刘晓千,燕楠,王岚.一种应用虚拟发音头像的普通话聋儿言语康复系统[J].集成技术,2013,2(4):68-73. 被引量：2
2李睿,张桦.具有真实感的虚拟人头部的一种构造方法[J].天津理工大学学报,2005,21(3):20-25.
3Luo Feilu,Zhou Jiwei,Ni Guoqi,Tao Zemin,Yang Qiaoling(Department of Mechantronics and Instrumentation, NUDT, Changsha, 410073).VISION-BASED MEASUREMENT OF FREE-FORM SURFACES[J].国防科技大学学报,1995,17(3):13-19.
4佚名.黑龙江：8名聋儿免费植入人工耳蜗[J].现代特殊教育,2005(9):37-37.
5王海军,王君英.四自由度混联机床的加工过程仿真研究[J].机电工程技术,2006,35(2):13-15. 被引量：2
6Xiong Hanwei①② Zhang Xiangwei② ①AI Institute of Zhejiang University, Hangzhou 310027, China ②Mechanic & Electronic Department of Shantou University, Shanton 515063, China).A Computer Vision Method for 3D Reconstruction of Curves-Marked Free-Form Surfaces[J].Computer Aided Drafting,Design and Manufacturing,2001,11(1):41-47.
7郑红娜.基于发音动作的中英文元音交叉语言对比研究[J].智能计算机与应用,2016,6(6):81-83.
8杨璞,易法令,刘王飞,杨远发.使用Dirichlet自由变形算法实现三维人脸及其变形[J].计算机技术与发展,2006,16(11):131-133. 被引量：2
9刘华东,吴玺宏,迟惠生.面向聋儿的计算机言语训练方法及其实现[J].北京大学学报（自然科学版）,2004,40(3):444-450. 被引量：4
10程丽.开展聋儿、脑瘫儿专项救助[J].中国残疾人,2009(10):53-53.

集成技术

2013年第1期

浏览历史

内容加载中请稍等...

汉语三维发音动作合成和动态模拟被引量：1

参考文献15

二级参考文献10

共引文献7

同被引文献7

引证文献1

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

汉语三维发音动作合成和动态模拟 被引量：1

参考文献15

二级参考文献10

共引文献7

同被引文献7

引证文献1

二级引证文献11

相关作者

相关机构

相关主题

浏览历史

汉语三维发音动作合成和动态模拟被引量：1