
基于三音素动态贝叶斯网络模型的大词汇量连续语音识别 被引量:3

Continuous Speech Recognition for Large Vocabulary Based on Triphone DBN Model
摘要 考虑连续语音中的协同发音现象,基于词-音素结构的DBN(WP-DBN)模型和词-音素-状态结构的DBN(WPS-DBN)模型,引入上下文相关的三音素单元,提出两个新颖的单流DBN模型:基于词-三音素结构的DBN(WT-DBN)模型和基于词-三音素-状态的DBN(WTS-DBN)模型。WTS-DBN模型是三音素模型,识别基元为三音素,以显式的方式模拟了基于三音素状态捆绑的隐马尔可夫模型(HMM)。大词汇量语音识别实验结果表明:在纯净语音环境下,WTS-DBN模型的识别率比HMM,WT-DBN,WP-DBN和WPS-DBN模型的识别率分别提高了20.53%,40.77%,42.72%和7.52%。 To avoid coarticulatory effects in continuous speech recognition, based on word- phone structure dynamic bayesian network (WP-DBN) model and word-phone-state structure DBN (WPS-DBN) model, context-dependent triphone units are introduced. Two novel single stream DBN models, that is, word-triphone structure DBN (WT-DBN) and'word-triphone-state structure DBN (WTS-DBN) models, are proposed for continuous speech recognition. WTS-DBN model is a triphone model and its modeling unit is triphone. It simulates a conventional HMM (hidden markov model) based triphone state-tying. Experimental results in large-vocabulary and clean speech environment show that the speech recognition rates of WTS-DBN model increase 20.53%, 40.77%, 42.72% and 7.52% than those of the HMM, WT-DBN, WP-DBN and WPS-DBN models.
出处 《数据采集与处理》 CSCD 北大核心 2009年第1期1-6,共6页 Journal of Data Acquisition and Processing
基金 中国博士后基金(20080431251)资助项目 国家"八六三"高技术研究发展计划(2007AA01Z324)资助项目
关键词 语音识别 动态贝叶斯网络 三音素 音素 speech recognition dynamic Bayesian network triphone phone
