期刊文献+

视觉驱动的语音合成系统中唇形轮廓的傅里叶描述 被引量:1

Lip contour description based on Fourier descriptors in speech synthesis system driven by visual-speech
下载PDF
导出
摘要 为了能够自动、快速地表示唇读系统中所必须的唇形轮廓特征,将傅里叶描述子用于唇形轮廓的描述和识别过程中,采用边界傅里叶变换的方法,得到非对称唇形模型中唇形轮廓的傅里叶描述子,用来刻画唇动过程中唇形轮廓的形状信息,并将傅里叶描述子φ作为唇形轮廓的特征向量,应用于基于隐马尔可夫模型(HMM)的视觉驱动语音合成系统。基于独立汉字发音的实验表明,单纯采用前15或20个傅里叶描述子就能够有效地刻画唇形轮廓描述,达到唇形识别的目的。 In order to describe the lip contours in a lipreading system automatically and quickly, Fourier descriptors are applied to describe and recognize the lip contours. After movement detection and morphological processing, boundary Fourier transform is used to get the Fourier descriptors of lip contours in unsymmetrical lip contour model, which is used to extract mouth region and parameters of lip contours from the image sequence. The Fourier descriptor ~p is used as the feature vector in speech synthesis system driven by visual-speech based on hidden Markov model. Experiments based on isolated Chinese words show that the lip contours can be reconstructed effectively only by using the first 15 or 20 Fourier descriptors, which reaches the goal of lip movement recognition.
出处 《仪器仪表学报》 EI CAS CSCD 北大核心 2007年第8期1464-1468,共5页 Chinese Journal of Scientific Instrument
关键词 非对称唇形轮廓模型 运动检测 数学形态学 傅里叶描述子 隐马尔可夫模型 unsymmetrical lip contour model movement detection morphological processing Fourier descriptor hidden Markov model (HMM)
  • 相关文献

参考文献16

  • 1CHEN T,RAO R.Joint audio-video processing for multimedia[C].Proceedings of 22nd International Conference on Industrial Electronics,Control,and Instrumentation,1996,1:548-553.
  • 2WANG R,YAO H X,GAO W.Recognition of sequence lip images and its application[C].Proceedings of 4th International Conference on Signal Processing,1998,1:849-854.
  • 3ZHANG X,MERSEREAU R M,CLEMENTS M,et al.Visual speech feature extraction for improved speech recognition[C].Proceedings of International Conference on Acoustics,Speech,and Signal Processing,2002,2:1993-1996.
  • 4KAYNAK M N,QI Z,Cheok A D,et al.Audio-visual modeling for bimodal speech recognition[C].Proceedings of International Conference on Systems,Man,and Cybernetics,2001,1:181-186.
  • 5SCANLON P,REILLY R.Feature analysis for automatic speechreading[C].Proceedings of 4th Workshop on Multimedia Signal Processing,2001:625-630.
  • 6MATTHEWS I,COOTES T F,BANGHAM J A,et al.Extraction of visual features for lipreading[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2002,24 (2):198-213.
  • 7SEGUIER R,CLADEL N.Multiobjectives genetic snakes:application on audio-visual speech recognition[C].Proceedings of 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications,2003,2:625-630.
  • 8CHANDRAMOHAN D,SILSBEE PL.A multiple deformable template approach for visual speech recognition[C].Proceedings of 4th International Conference on Spoken Language,1996,1:50-53.
  • 9LIE W N,HSIEH H C.Lips detection by morphological image processing[C].Proceedings of 4th International Conference on Signal Processing,1998,2:1084-1087.
  • 10GRAF H P,COSATTO E,POTAMIANOS M.Robust recognition of faces and facial features with a multi-modal system[C].Proceedings of International Conference on Systems,Man,and Cybernetics,1997,3:2034-2039.

二级参考文献16

  • 1梁毅雄,龚卫国,潘英俊,李伟红,刘嘉敏,张红梅.基于奇异值分解的人脸识别方法[J].光学精密工程,2004,12(5):543-549. 被引量:40
  • 2巴雷特H H 张万里(译).放射成像、图像形成、检测和处理的理论[M].北京:科学出版社,1988..
  • 3WANG R,YAO H X,GAO W, Recognition of sequence lip images and its application[C]. IEEE Fourth International Conference on Signal Processing, 1998,(Ⅰ):849-854
  • 4MATTHEWS I, COOTES T F, BANGHAM J A, et al. Extraction of visual features for lip reading[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24 (2):198-213.
  • 5SCANLON P,REILLY R. Feature analysis for automatic speech reading[C]. IEEE Fourth Workshop on Multimedia Signal Processing, 2001, Page(s):625-630.
  • 6ZHANG X, MERSEREAU R M,CLEMENTS M,et al. Visual speech feature extraction for improved speech recognition[C]. Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, 2002,2:1993-1996.
  • 7SEGUIER R, CLADEL N. Multiobjectives genetic snakes: application on audio-visual speech recognition[C].Fourth EURAS IP Conference focused on Video/Image Processin g and Multimedia Communications, 2003:625-630.
  • 8CHANDRAMOHAN D, SILSBEE P L. A multiple deformable template approach for visual speech recognition[C].Fourth International Conference on Spoken Language, 1996, 1:50-53.
  • 9晏洁.文本驱动的唇动合成系统[J].计算机工程与设计,1998,19(1):31-34. 被引量:16
  • 10王磊,莫玉龙,戚飞虎.基于弹性模板的嘴巴轮廓提取[J].上海大学学报(自然科学版),1998,4(5):579-585. 被引量:6

共引文献8

同被引文献4

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部