改进参数控制的可视语音合成方法

Improved visual speech synthesis method of parameter control

下载PDF

导出

摘要传统单音素对音节内部和音节之间的协同发音影响采用相同处理方法,为此,分析音节内部和音节之间两种不同协同发音现象对可视语音合成的不同影响,提出一种改进参数控制的可视语音合成方法。针对不同音节,不改变元、辅音视位峰值处权值函数的幅度及其变化速度参数,仅修改元、辅音的时间参数,使修改后的元、辅音视位参数能更好地模拟真实音节发音过程中发音器官的动态变化特征。实验结果表明,改进方法能有效地解决音节内协同发音的问题,改善了可视语音合成的质量。 To solve the unreasonable problem that the same processing method was adopted for coarticulation problem with inner syllabic and inter-syllabic, the different influences on the visual speech synthesis of different kinds of coarticulation phenomenons with inner-syllabic and inter-syllabic were analyzed, and an improved visual speech synthesis method based on monophone para- meter control was put forward. The amplitude of the weight function about the peak value and the rate of change of the vowels viseme and consonants viseme were unchanged, only the time parameters of vowels viseme and consonants viseme were modified. The modified parameters were used to simulate the dynamic change characteristics in the process of real syllable pronunciation organs, bringing about better performances. The feasibility of the method was validated by practical application. Results show that the improved method can solve the coarticulation problem with inner-syllabic, and improve the quality of the visual speech synthesis.

作者刘学杰赵晖

机构地区新疆大学信息科学与工程学院

出处《计算机工程与设计》北大核心 2017年第4期989-995,共7页 Computer Engineering and Design

基金国家自然科学基金项目(61261037 61561047)

关键词可视语音合成参数控制维吾尔语视位协同发音 visual speech synthesis monophone parameter control Uyghur viseme coarticulation

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献4

1曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
2张金溪,李永宏,单广荣,李照耀,江静.面向语音合成的藏语单音素与三音素自动切分算法研究[J].计算机应用研究,2013,30(11):3272-3275. 被引量：5
3其米克.巴特西,黄浩,王羡慧.基于深度神经网络的维吾尔语语音识别[J].计算机工程与设计,2015,36(8):2239-2244. 被引量：13
4王志明,蔡莲红.动态视位模型及其参数估计[J].软件学报,2003,14(3):461-466. 被引量：8

二级参考文献31

1王丽娟,曹志刚.基于HMM模型的语音单元边界的自动切分[J].数据采集与处理,2005,20(4):381-384. 被引量：4
2[1]Cohen MM, Massaro DW. Modeling coarticulation in synthetic visual speech. In: Thalmann NM, Thalmann D, eds. Models Techniques in Computer Animation. Tokyo: Springer-Verlag, 1993. 139～156.
3[2]Reveret L, Bailly G, Badin P. Mother: a new generation of talking heads providing a flexible articulatory control for video-realistic speech animation. In: Yuan Bao-Zong, Huang Tai-Yi, Tang Xiao-Fang, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 755～758.
4[3]Brooke NM, Scott SD. Computer graphics animations of talking faces based on stochastic models. In: International Symposium on Speech, Image Processing and Neural Networks. 1994. 73～76.
5[4]Masuko T, Kobayashi T, Tamura M. Text-to-Visual speech synthesis based on parameter generation from HMM. In: Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing (Ⅵ). 1998. 3745～3748.
6[5]Bregler C, Covell M, Slaney M. Video rewrite: driving visual speech with audio. In: Proceedings of the ACM SIGGRAPH Conference on Computer Graphics. 1997. 353～360.
7[6]Cosatto E, Potamianos G, Graf HP. Audio-Visual unit selection for the synthesis of photo-realistic talking-heads. In: IEEE International Conference on Multimedia and Expo (Ⅱ). 2000. 619～622.
8[7]Steve M, Andrew B. Modeling visual coarticulation in synthetic talking heads using a lip motion unit inventory with concatenative synthesis. In: Yuan BZ, Huang TY, Tang XF, eds. Proceedings of the 6th International Conference on Spoken Language Processing (Ⅱ). Beijing: China Military Friendship Publish, 2000. 759～762.
9[8]International Standard. Information technology-coding of audio-visual objects (Part 2). Visual; Admendment 1: Visual extensions, ISO/IEC 14496-2: 1999/Amd.1:2000(E).
10[9]Zhong J, Olive J. Cloning synthetic talking heads. In: Proceedings of the 3rd ESCA/COCOSDA Workshop on Speech Synthesis. 1998. 26～29.

共引文献25

1李爱军,张利刚,李洋,孟昭鹏,王霞.汉语口语对话中姿态与语音信息关系初探[J].清华大学学报（自然科学版）,2008,48(S1):627-634.
2郑红娜,朱云,王岚,陈辉.汉语三维发音动作合成和动态模拟[J].集成技术,2013,2(1):23-28. 被引量：1
3王志明,陶建华.文本-视觉语音合成综述[J].计算机研究与发展,2006,43(1):145-152. 被引量：5
4张小凤,杨卫英,蔡方方,田超.汉语复韵母的三维动态视位模型[J].电声技术,2009,33(12):54-57. 被引量：3
5李冰锋,谢磊,周祥增,付中华,张艳宁.实时语音驱动的虚拟说话人[J].清华大学学报（自然科学版）,2011,51(9):1180-1186. 被引量：2
6李皓,陈艳艳,唐朝京.唇部子运动与权重函数表征的汉语动态视位[J].信号处理,2012,28(3):322-328. 被引量：12
7曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
8徐世鹏,杨鸿武,王海燕.面向藏语语音合成的语音基元自动标注方法[J].计算机工程与应用,2015,51(6):199-203. 被引量：6
9黄威,石佳影.基于深度神经网络的语音识别研究[J].现代计算机,2016,22(5):20-25. 被引量：4
10梁玉龙,屈丹,李真,张文林.基于卷积神经网络的维吾尔语语音识别[J].信息工程大学学报,2017,18(1):44-50. 被引量：11

1吴翠娟,赵晖.可视化协同发音合成研究综述[J].现代计算机,2014,20(9):9-14.
2王志明,蔡莲红,艾海舟.基于数据驱动方法的汉语文本-可视语音合成(英文)[J].软件学报,2005,16(6):1054-1063. 被引量：16
3陶京京,王丽荣.三维可视语音合成系统中唇部特征点的采集与处理[J].长春大学学报,2014,24(6):715-718.
4赵晖,唐朝京.基于汉语视频三音素的可视语音合成[J].电子与信息学报,2009,31(12):3010-3014.
5曹亮,赵晖.具有情感表现力的可视语音合成研究综述[J].计算机工程与科学,2015,37(4):813-818. 被引量：3
6马娥娥,王成儒.可视语音合成中口形特征点定位研究[J].计算机工程与应用,2010,46(8):190-192.
7贾熹滨,尹宝才,李敬华.语音同步的可视语音合成技术研究[J].北京工业大学学报,2005,31(6):656-661. 被引量：5
8尹宝才,王恺,王立春.基于MPEG-4的融合多元素的三维人脸动画合成方法[J].北京工业大学学报,2011,37(2):266-271. 被引量：7
9李冰锋,谢磊,周祥增,付中华,张艳宁.实时语音驱动的虚拟说话人[J].清华大学学报（自然科学版）,2011,51(9):1180-1186. 被引量：2
10尹宝才,李敬华,贾熹滨,孙艳丰.基于两层隐马尔可夫模型的可视语音合成[J].北京工业大学学报,2006,32(5):416-418. 被引量：4

计算机工程与设计

2017年第4期

浏览历史

内容加载中请稍等...

改进参数控制的可视语音合成方法

参考文献4

二级参考文献31

共引文献25

相关作者

相关机构

相关主题

浏览历史