期刊文献+

基于动态贝叶斯网络的大词汇量连续语音识别和音素切分研究 被引量:1

A Novel SM-DBN Model for Large-Vocabulary Continuous Speech Recognition and Phone Segmentation
下载PDF
导出
摘要 提出一个新颖的单流多状态动态贝叶斯网络(Single stream Multi-states DynamicBayesian Network,SM-DBN)模型,以实现大词汇量连续语音识别和音素切分。该模型在Bilmes等人提出的单流动态贝叶斯网络(Single stream Dynamic Bayesian Network,Phone-shared,SS-DBN-P)模型(识别基元为词)基础上,增加了一个隐含的状态节点层,每个词由它的对应音素组成,而音素采用固定个数的状态描述,状态和观测向量直接连接。它的识别基元为音素,描述了音素的动态发音变化过程。大词汇量语音识别的实验结果表明:在纯净语音环境下,SM-DBN模型的识别率比HMM和SS-DBN-P模型的识别率分别提高了13.01%和35.2%,而音频流的音素切分正确率则分别提高了10%和44%。 A novel SM-DBN (Single-stream Multi-state Dynamic Bayesian Network) model is proposed. It is an augmentation of the Single Stream DBN Phone-shared (SS-DBN-P) model proposed by Bilmes et al whose basic recognition units are words, to which we add an extra level of hidden nodes-states, resulting in the SM-DBN model. In our model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation features. Essentially, it is a phone model whose basic recognition units are phones. We perform the recognition and segmentation experiments with both continuous digital speech database and large-vocabulary speech database, with the experimental results given in Tables 1 through 3 in the full paper. The experimental results on largevocabulary and clean speech model is 13.01% and 35% model respectively, and thai environment show preliminarily higher than those of the HMM its phone segmentation accuracy that the speech recognition rate of SM-DBN (Hidden Markov Model) and the SS-DBN-P is respectively 10% and 44% higher than the other two models.
出处 《西北工业大学学报》 EI CAS CSCD 北大核心 2008年第2期173-178,共6页 Journal of Northwestern Polytechnical University
基金 中国科技部与比利时国际合作项目(No.[2004]487)资助
关键词 动态贝叶斯网络 音视频语音识别音素切分 single-stream multi-state dynamic Bayesian network (SM-DBN), continuous speechrecognition, phone segmentation
  • 相关文献

参考文献7

  • 1Zweig G. Speech Recognition with Dynamic Bayesian Networks. [Ph. D. Thesis]. University of California, Berkeley, 1998
  • 2Zweig G. Bayesian Network Structures and Inference Techniques for Automatic Speech Recognition, Computer Speech and Language, 2003, 17:173-193
  • 3Murphy K. Dynamic Bayesian Networks: Representation, Inference and Learning. [Ph. D. Thesis], University of Cali-fornia Berkeley, 2002. http://www.cs. ubc. ca/-murphyk/Thesis/thesis. pdf
  • 4Bilmes J, Zweig G, Richardson T, et al. Discriminatively Structured Graphical Models for Speech Recognition: JHUWS-2001 Final Workshop Report, Johns Hopkins Univ, Baltimore, MD, Tech Rep CLSP, 2001, http://www. clsp. jhu. edu/ws2001/groups/gmsr/GMRO-final-rpt. pdf
  • 5Bilmes J and Bartels C. Graphical Model Architectures for Speech Recognition. IEEE Signal Processing Magazine, 2005, 22(5): 89-100
  • 6Bilmes J, Zweig G. The Graphical Models Toolkit:An Open Source Software System For Speech And Time-Series Processing, Proceedings of the IEEE International Conf on Acoustic Speech and Signal Processing(ICASSP). 2002, 4: 3916-3919
  • 7Bilmes J. GMTK: The Graphical Models Toolkit. http://ssli. ee. washington. edu/-bilmes/gmtk/

同被引文献8

  • 1王丽娟,曹志刚.基于HMM模型的语音单元边界的自动切分[J].数据采集与处理,2005,20(4):381-384. 被引量:4
  • 2Htkbook[EB/OL].http://users.ece.gatech.edu/-antonio/htk.book/htkbook.html.
  • 3GAO Lu, YU Hong-zhi, LI Yong-hong, et al. Study on SAMPA_ST for Lhasa Tibetan and realization of automatic labelling system [ C ]//Proc of International Conference on Image Analysis and Signal Processing. 2010 : 133-137.
  • 4HTS [ EB/OL ]. http ://hts. sp. nitech, ac. jp/.
  • 5ZEN H, TOKUDA K, MASUKO T,et al. Hidden semi-Markov model based speech synthesis [ C ]//Proc of the 8th International Conference on Spoken Language Processing. 2004.
  • 6ZEN H. Fundamentals and recent advances in HMM-based speech synthesis [ EB/OL ]. http://lorien, die. upm. es/~ lapiz/rtth/JOR- NADAS/VI/pdfs/0049. pdf.
  • 7TOKUDA K, YOSHIMURA T, TAKASHI M, et al. Speech parame- ter generation algorithms for HMM-based speech synthesis [ C ]//Proc of IEEE Intemational Conference on Acoustics, Speech, and Signal Processing. 2000 : 1315 - 1318.
  • 8高璐,于洪志,郑文思.基于HMM的藏语拉萨话语音合成技术研究[J].西北民族大学学报(自然科学版),2011,32(2):30-35. 被引量:6

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部