摘要
针对说话时发音和口形的异步问题,提出了一个多流异步动态贝叶斯网络(Dynamic Bayesian Network,DBN)模型,以实现基于音视频特征的连续语音识别,在这个模型中,音频流和视频流在词节点同步,而在词节点之间,音视频流有各自独立的拓扑结构以及节点变量之间的条件依赖关系,同时词转移节点变量由音视频流共同确定,模型在词级另q上体现了音视频流的异步性。采用连续数字音视频数据库的实验结果表明,在信噪比为O~30dB的测试环境下,比较单流DBN模型和多流隐马尔可夫模型,平均识别率分别提高了8.68%和10.07%。
Asynchrony of the speech and lip motion is Multi-Stream Asynchrony Dynamic Bayesian Network important in audio-visual speech recognition. A (MS-ADBN) model is proposed to implement audio-visual speech recognition. In this model, audio stream and visual stream are synchronous in word node, but between the word nodes, each stream has its own independent nodes and conditional probability relationship between the nodes, and word transition probability is determined by audio stream and visual stream together. With an MS-ADBN model, we can describe the asynchrony of audio stream and visual stream to the word level. The experiments are done on continuous digit audio-visual speech database, and results show that in the noisy environment with signal to noise ratios ranging from 0dB to 30dB, the average speech recognition rate of MS-ADBN model is 8.68% and 10. 07% higher than those of the single stream DBN model and multi-stream Hidden Markov Model (HMM).
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2008年第4期518-523,共6页
Journal of Northwestern Polytechnical University
基金
中国科技部和比利时国际合作项目(No.[2004].487)资助
关键词
多流异步
动态贝叶斯网络
音视频
语音识别
multi-stream asynchrony, Dynamic Bayesian Network (DBN), speech recognition, audiovisual speech database