期刊文献+

基于改进的隐马尔科夫模型的语音识别方法 被引量:19

A speech recognition method based on improved hidden Markov model
下载PDF
导出
摘要 针对隐马尔可夫(HMM)语音识别模型状态输出独立同分布等与语音实际特性不够协调的假设以及在使用段长信息时存在的缺陷,对隐马尔可夫模型进行改进,提出马尔可夫族模型。马尔可夫族模型可看作一个数学上由多个马尔可夫链构成的多重随机过程,HMM模型则是双重随机过程,因而,HMM模型可视为马尔可夫族模型的特例。马尔可夫族模型用条件独立性假设取代了HMM模型的独立性假设。相对条件独立性假设,独立性假设是过强假设,因而,基于马尔可夫族模型的语音模型更符合语音实际物理过程。在马尔可夫族语音识别模型中引入状态段长信息,能自动根据语速对语音单元段长进行调整。非特定人连续语音实验结果表明,利用状态段长信息的改进语音识别模型比经典HMM模型的性能明显提高。 In order to overcome the defects of the duration modeling of homogeneous hidden Markov model (HMM) in speech recognition and the unrealistic assumption that successive observations are independent and identically distribution within a state, Markov family model (MFM) was proposed. In the speech recognition model based on HMM, the time-sequence structure of speech signal was considered to be a double stochastic process, while Markov family model was a multiple stochastic process which consists of a few Markov chains, so HMM could be considered to be a special case of MFM. Moreover, independence assumption in HMM was placed by conditional independence assumption in MFM, and from the view of the statistics, the assumption of independence is stronger than that of conditional independence, so speech recognition model based on MFM is more realistic than HMM recognition mode. Markov Family model was applied to speech recognition, and duration distribution based MFM recognition mode which takes duration distribution into account and integrates the frame and segment based acoustic modeling techniques, was proposed. The speaker independent continuous speech recognition experiments show that this new recognition model has better performance than standard HMM recognition models.
作者 袁里驰
出处 《中南大学学报(自然科学版)》 EI CAS CSCD 北大核心 2008年第6期1303-1308,共6页 Journal of Central South University:Science and Technology
基金 国家自然科学基金资助项目(60663007) 中南大学博士后科学基金资助项目(2007)
关键词 隐马尔可夫模型 马尔可夫族模型 段长 语音识别 hidden Markov model Markov family model duration speech recognition
  • 相关文献

参考文献17

  • 1Rabiner L, Juang B H. Fundamentals of speech recognition[M]. New Jersey: Prentice Hall, 1993.
  • 2Chang E, ZHOU Jian-lai, SHOU Di, et al. Large vocabulary mandarin speech recognition with different approaches in modeling tones[C]//Proceedings of the 6th International Conference on Spoken Language Processing(ICSLP 2000). San Jose: IEEE Press, 2000: 983-986.
  • 3Mitchell C D, Jamieson L H. Modeling duration in a hidden Markov model with the exponential family[C]//Proceedings of the IEEE International conference on Acoustic, Speech, Signal Process (ICASSP 1993). San Jose: IEEE Press, 1993: 331-334.
  • 4Shinoda K, Lee C. A structural Bayes approach to speaker adaptation[J]. IEEE Transaction on Speech and Audio Processing, 2001, 9(3): 276-287.
  • 5Vasehgi S V. State duration modeling in hidden Markov models[J]. Journal of Signal Processing, 1995, 41(1): 31-41.
  • 6Lai W H, Chen S H. Analysis of syllable duration models for mandarin speech[C]//Proceedings of the IEEE International conference on Acoustic, Speech, Signal Process (ICASSP 2002). San Jose: IEEE Press, 2002: 497-500.
  • 7王作英,肖熙.基于段长分布的HMM语音识别模型[J].电子学报,2004,32(1):46-49. 被引量:42
  • 8Hon H W, Wang K S. Unified frame and segment based models for automatic speech recognition[C]//Proceedings of the IEEE International conference on Acoustic, Speech, Signal Process (ICASSP 2000). San Jose: IEEE Press, 2000:1017-1020.
  • 9GONG Yi-fan. Stochastic trajectory modeling and sentence searching for continuous speech recognition[J]. IEEE Transactions on Speech Audio Processing, 1997, 5(1): 33-44.
  • 10WANG W J, CHEN S H. The study of prosodic modeling for mandarin speech[C]//Proceedings of the International Computer Symposium (ICS). Hualien: IEEE Computer Society Press, 2002 1777-1784.

二级参考文献57

  • 1冷京.小波变换在语音变速上的应用[J].上海师范大学学报(自然科学版),1999,28(1):44-50. 被引量:1
  • 2王仁华.面向2000年通信的语音处理技术[J].中兴新通讯,1996,2(1):40-43. 被引量:1
  • 3马明,张焱,王建宇,黄志同.对语音识别中短时自关特征的研究[J].电脑开发与应用,1997,10(1):2-4. 被引量:1
  • 4齐士钤 张家禄.汉语普通话辅音音长分析[J].声学学报,1982,(1):8-13.
  • 5GibsonJD 李煜晖 等译.多媒体数字压缩原理与标准[M].北京:电子工业出版社,2000..
  • 6王作英.基于段长分布的HMM语音识别模型 [A]..第二届全国汉字汉语识别会议 [C].庐山,1989.9.
  • 7Kleijn W B, Kroon P. A 5.85kb/s CELP Algorithm for Cellular Applications[J]. ICASSP,1993, 2:596-599.
  • 8Griffin D W, Lira J S. Multi-band excitation voeoder[J].IEEE Trans on ASSP, 1998,36(8):1223-1235.
  • 9Tufekci Z.and Gowdy J.N,Feature Extraction Using Discrete Wavelet Transform for Speech Recognition[A].Southeastcon 2000.In:Proceedings of the IEEE[C].7-9 April 2000,116-123.
  • 10Yuri Romanyshyn,Wavelet Transforms Applications for Speech Signals Processing[A].CADSM * 2001 Proceedings[C],297-298

共引文献72

同被引文献162

引证文献19

二级引证文献142

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部