摘要
提出一个新颖的单流多状态动态贝叶斯网络(Single stream Multi-states DynamicBayesian Network,SM-DBN)模型,以实现大词汇量连续语音识别和音素切分。该模型在Bilmes等人提出的单流动态贝叶斯网络(Single stream Dynamic Bayesian Network,Phone-shared,SS-DBN-P)模型(识别基元为词)基础上,增加了一个隐含的状态节点层,每个词由它的对应音素组成,而音素采用固定个数的状态描述,状态和观测向量直接连接。它的识别基元为音素,描述了音素的动态发音变化过程。大词汇量语音识别的实验结果表明:在纯净语音环境下,SM-DBN模型的识别率比HMM和SS-DBN-P模型的识别率分别提高了13.01%和35.2%,而音频流的音素切分正确率则分别提高了10%和44%。
A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on largevocabulary and clean speech model is 13.01% and 35% model respectively, and thai environment show preliminarily higher than those of the HMM its phone segmentation accuracy that the speech recognition rate of SM-DBN (Hidden Markov Model) and the SS-DBN-P is respectively 10% and 44% higher than the other two models.
出处
《西北工业大学学报》
EI
CAS
CSCD
北大核心
2008年第2期173-178,共6页
Journal of Northwestern Polytechnical University
基金
中国科技部与比利时国际合作项目(No.[2004]487)资助
关键词
动态贝叶斯网络
音视频语音识别音素切分
single-stream multi-state dynamic Bayesian network (SM-DBN), continuous speechrecognition, phone segmentation