期刊文献+

基于上下文三音素DBN模型的连续语音识别

DBN model based on triphone for continuous speech recognition
下载PDF
导出
摘要 考虑连续语音中的协同发音问题,提出基于词内扩展的单流上下文相关三音素动态贝叶斯网络(SS-DBN-TRI)模型和词间扩展的单流上下文相关三音素DBN(SS-DBN-TRI-CON)模型。SS-DBN-TRI模型是Bilmes提出单流DBN(SS-DBN)模型的改进,采用词内上下文相关三音素节点替代单音素节点,每个词由它的对应三音素单元构成,而三音素单元和观测向量相联系;SS-DBN-TRI-CON模型基于SS-DBN模型,通过增加当前音素的前音素节点和后音素节点,构成一个新的词间扩展的三音素变量节点,新的三音素节点和观测向量相联系,采用高斯混合模型来描述,采用数字连续语音数据库的实验结果表明:SS-DBN-TRI-CON具备最好的语音识别性能。 To accurately capture the variations of real speech spectra,two single stream Dynamic Bayesian Network(DBN) models based on context-dependent triphone:SS-DBN-TRI model and SS-DBN-TRI-CON model,are proposed for continuous speech recognition.SS-DBN-TRI model is an augmentation of Single Stream DBN (SS-DBN) model proposed by Bilmes,the phone vari- able is replaced by triphone variable generated by inter-word;simultaneously,based on SS-DBN model,a previous phone node and a next phone node of current phone are added,resulting in a new triphone node to describe co-articulary of continuous speech inter-word,new triphone node is associated with observation,with some probabilities modeled by Gaussian Mixture Model. Experiment is done on continuous digit audio database,results show that:SS-DBN-TRI-CON model has the best performance in word recognition.
出处 《计算机工程与应用》 CSCD 北大核心 2007年第35期35-38,共4页 Computer Engineering and Applications
基金 中国科技部与比利时弗拉芒大区科技合作项目(No.[2004]487) 西北工业大学英才培养计划项目(No.04XD0102)。
关键词 动态贝叶斯网络 语音识别 三音素 单音素 上下文相关 Dynamic Bayesian Network(DBN) speech recognition triphone mono-phone context-dependent
  • 相关文献

参考文献8

  • 1Murphy K.Dynamic Bayesian Networks:representation,inference and learning[D/OL].Dept EECS,CS Division,Univ California,Berkeley [2002].http ://www.cs.ubc.ca/-murphyk/Thesis/thesis.pdf.
  • 2Bilmes J,Zweig G.The graphical modelds toolkit:an open source software system for speech and time-series processing[C]//Proceedings of the IEEE International Conf on Acoustic Speech and Signal Processing(ICASSP), 2002(4) : 3916-3919,
  • 3Bilmes J,Bartels C.Graphical model architectures for speech recognition [J].IEEE Signal Processing Magazine, 2005,22 ( 5 ) : 89 - 100.
  • 4Zweig G.Speech recognition with Dynamic Bayesian Networks[D]. Berkeley:Univ California, 1998.
  • 5Bilmes J,Zweig G, Richardson T,et al.Discriminatively structured graphical models for speech recognition,JHU-WS-2001 Final Workshop Report [R/OL].Johns Hopkins Uni,Baltimore,MD,Tech Rep C LSP[2001 ].http ://www.clsp.j hu.edu/ws2001/groups/gmsr/GMROfinal-rpt.pdf.
  • 6Bilmes J.GMTK:the graphical models toolkit[EB/OL].http://ssli.ee. washington.edu/-bilmes/gmtk/.
  • 7Young S J,Kershaw D,Odell J,et al.The HTK book[EB/OL].http:// htk.eng.cam.ac.uk/docs/docs.shtml.
  • 8Hirsch H G,Pearce D.The aurora experimental framework for the performance evaluations of speech recognition systems under noisy conditions[C]//ICSA ITRW ASR2000,September 2000.

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部