期刊文献+

基于汉语视频三音素的可视语音合成

Visual Speech Synthesis Algorithm Based on Chinese Visual Triphone
下载PDF
导出
摘要 为了合成具有真实感的视频序列,该文提出一种基于汉语视频三音素的可视语音合成方法。根据汉语的发音规律和音素与视素的对应关系,该文提出"视频三音素"的概念。在此基础上,建立隐马尔可夫(HMM)训练与合成模型,在训练过程中使用了视频音频联合特征,并加入了动态特征。在合成过程中,连接视频三音素HMM模型形成句子HMM,并从中提取特征参数,合成可视语音。从主观和客观评估结果来看,合成视频的真实感强,满意度较高。 In order to synthesize real video sequence, a visual speech synthesis algorithm based on Chinese visual triphone is proposed. According to Chinese pronunciation principle and the relationship between phoneme and viseme, conception of ‘visual triphone’ is presented. Hidden Markov Model(HMM) is established based on visual triphones. In the training stage, combined features including visual features and audio features are used. In the synthesis stage, sentence HMM is constructed by concatenating triphone HMMs, from which the feature parameters are extracted. From the result of subjective and objective evaluation, the synthesized video is real and satisfied.
作者 赵晖 唐朝京
出处 《电子与信息学报》 EI CSCD 北大核心 2009年第12期3010-3014,共5页 Journal of Electronics & Information Technology
基金 国家部委基金(51329060101)资助课题
关键词 可视语音合成 视频三音素 隐马尔可夫模型 联合特征 Visual speech synthesis Visual triphone Hidden Markov Model(HMM) Combined features
  • 相关文献

参考文献9

  • 1Summerfield Q. Use of visual information in phonetic perception[J]. Phonetic, 1979, 36(4/5): 314-331.
  • 2McGurk H and Macdonald J. Hearing lips and seeing voices[J]. Nature, 1976, 264(5588): 746-748.
  • 3Perng Woei-luen, Wu Yung-kang, and Ming Ouh-young. Image talk: a real time synthetic talking head using one single image with Chinese text-to-speech capability[C]. Sixth Pacific Conference on Computer Graphics and Applications, Singapore, 1998: 140-148.
  • 4王志明,蔡莲红,吴志勇,陶建华.汉语文本-可视语音转换的研究[J].小型微型计算机系统,2002,23(4):474-477. 被引量:9
  • 5Masuko T, Kobayashi T, and Tamura M, et al.. Text-to-visual speech synthesis based on parameter generation from HMM[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Seattle, USA, 1998, 6: 3745-3748.
  • 6Jiang Jin-tao, Aronoff J M, and Bernstein L E. Development of a visual speech synthesizer via second-order isomorphism[C]. IEEE International Conference on Acoustics, Speech and Signal Processing, Las Vegas USA, 2008: 4677-4680.
  • 7Zhou Wei and Wang Zeng-fu. Speech Chinese mandarin triphone model animation based on International Conference on Computer Science, Melbourne, Australia, July 2007: 6th IEEE/ACIS and Information 924-929.
  • 8吴华,徐波,黄泰翼.基于三音子模型的语料自动选择算法[J].软件学报,2000,11(2):271-276. 被引量:12
  • 9Zhao Hui and Tang Chao-jing. Visual speech synthesis based on Chinese dynamic visemes[C]. IEEE International Conference on Information and Automation, Zhangjiajie, China, June, 2008: 139-143.

二级参考文献5

  • 1王志明 蔡莲红.汉语音节与口形关系的研究.第九届全国多媒体技术学术会议(NCMT'2000)[M].北京,2000..
  • 2Gao Sheng,Proceedings of the ’98 International Symposium on Chinese Spoken Language Proce,1998年,44页
  • 3曲菲,第 4届全国人机语音通讯学术会议论文集,1996年,337页
  • 4孙甲松,’95智能计算机接口与应用进展,1995年,116页
  • 5林焘,语音学教程,1991年

共引文献19

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部