期刊文献+

基于多流动态贝叶斯网络的音视频连续语音识别

A Multi-Stream Asynchrony Model Based on Dynamic Bayesian Network for Audio-Visual Continuous Speech Recognition
下载PDF
导出
摘要 针对说话时发音和口形的异步问题,提出了一个多流异步动态贝叶斯网络(Dynamic Bayesian Network,DBN)模型,以实现基于音视频特征的连续语音识别,在这个模型中,音频流和视频流在词节点同步,而在词节点之间,音视频流有各自独立的拓扑结构以及节点变量之间的条件依赖关系,同时词转移节点变量由音视频流共同确定,模型在词级另q上体现了音视频流的异步性。采用连续数字音视频数据库的实验结果表明,在信噪比为O~30dB的测试环境下,比较单流DBN模型和多流隐马尔可夫模型,平均识别率分别提高了8.68%和10.07%。 Asynchrony of the speech and lip motion is Multi-Stream Asynchrony Dynamic Bayesian Network important in audio-visual speech recognition. A (MS-ADBN) model is proposed to implement audio-visual speech recognition. In this model, audio stream and visual stream are synchronous in word node, but between the word nodes, each stream has its own independent nodes and conditional probability relationship between the nodes, and word transition probability is determined by audio stream and visual stream together. With an MS-ADBN model, we can describe the asynchrony of audio stream and visual stream to the word level. The experiments are done on continuous digit audio-visual speech database, and results show that in the noisy environment with signal to noise ratios ranging from 0dB to 30dB, the average speech recognition rate of MS-ADBN model is 8.68% and 10. 07% higher than those of the single stream DBN model and multi-stream Hidden Markov Model (HMM).
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2008年第4期518-523,共6页 Journal of Northwestern Polytechnical University
基金 中国科技部和比利时国际合作项目(No.[2004].487)资助
关键词 多流异步 动态贝叶斯网络 音视频 语音识别 multi-stream asynchrony, Dynamic Bayesian Network (DBN), speech recognition, audiovisual speech database
  • 相关文献

参考文献10

  • 1Potamianos G, Neti C, et al. Recent Advances in the Automatic Recognition of Audiovisual Speech. Proc IEEE, 2003, 91(9): 1306- 1326
  • 2Gravier G, Potamianos G, and Neti C. Asynchrony Modeling for Audio-Visual Speech Recognition. Proc Human Language Technology Conf, San Diego, CA, 2002, 1:1-6
  • 3谢磊.听视觉语音识别中的关键问题研究:[博士论文].西安:西北工业大学,2004
  • 4Dupont S, Luettin J. Audio-Visual Speech Modeling for Continuous Speech Recognition. IEEE Trans on Multimedia, 2000, 2(3):141-151
  • 5Bilmes J and Zweig G. The Graphical Models Toolkit. An Open Source Software System for Speech and Time Series Processing. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Orlando, 2002, 4:3916-3919
  • 6Zweig G. Speech Recognition with Dynamic Bayesian Networks. [Ph D Thesis]. University of California, Berkeley, 1998
  • 7Zhang Yimin, Qian Diao, Shan Huang, et al. DBN Based Multi-Stream Models for Speech. IEEE International Conference on Acoustics, Speech, and Signal Processing, Hong Kong, China, 2003, 1:836-839
  • 8Gowdy J, Subramanya A, Barrels C, and Bilmes J. DBN Based Multi-Stream Models for Audio-Visual Speech Recognition. IEEE International Conference on Acoustics, Speech and Signal Processing, Canada, 2004, 1. 993-996
  • 9Ravyse I, Jiang Dongmei, Jiang Xiaoyue, Lv Guoyun, et al. DBN Based Models for Audio-Visual Speech Analysis and Recognition. 2006 Pacific-Rim Conference on Multimedia (PCM 2006), Hangzhou, China, 2006, 1: 19-30
  • 10Zhou Yi, Gu Lie, Zhang Hongjiang. Bayesian Tangent Shape Model: Estimating Shape and Pose Parameters via Bayesian Inference. The IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2003), Wisconsin, USA, 2003, 1:109-116

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部