
基于决策树的藏语拉萨话三音子模型 被引量:4

Triphone models of Lhasa Tibetan based on decision tree
摘要 对藏语拉萨话中单音子及三音子分布情况进行了统计,分析了在藏语大词表连续词表连续语音识别中建立上下文相关声学模型的必要性。选择音素为建模单元,根据藏语特点,建立以音节为单位的发音字典。讨论了利用决策树建立三音子模型的几个关键问题和基本算法,结合国际音标分类和经验知识,确定了38个藏语拉萨话音子类别集及相应的决策树问题集。建立了共20个发音人8 170句的训练语料,在HTK平台上建立和训练得到了基于决策树的藏语拉萨话三音子模型,并分析了不同隐马尔可夫模型状态数及高斯混合度下的识别结果,确定了一套藏语大词表连续语音识别的完整方案。 Abstract:Probability distribution of monophones and triphones in Lhasa Tibetan are calculated and the necessity of establishing a contextual acoustic model in ASR for Lhasa Tibetan is analyzed. Phoneme is chosen as basic unit for acoustic models. According to the characteristics of Tibetan, a pronunciation dictionary based on single syllable is established. Main issues and algorithms for triphone models based on decision tree are discussed. According to IPAs and characteristics of Lhasa dialect, 38 phoneme sub- sets and question sets for triphone modeling are established. 8170 sentences of 20 speakers are recorded to train the models. Contextual continuous Hidden Markov Models(HMM) based on triphones are es tablished and trained on HTK platform. The recognition results under different sates number and mix tures are analyzed. And the framework for large-vocabulary continuous speech recognition of Lhasa Dia- lect is established.
出处 《计算机工程与科学》 CSCD 北大核心 2013年第9期146-150,共5页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61262054) 西北民族大学中央高校基本科研业务费专项资金项目(zyz2011100)
关键词 藏语 拉萨话 大词表连续语音识别 隐马尔可夫模型 三音子模型 Tibetan Lhasa dialect ~ LVCSR ~ HMM ~ triphone model
  • 相关文献


  • 1OdellJ J. The use of context in large vocabulary speech rec-ognition[D]. Cambridge s Cambridge University. 1995.
  • 2Steve Y. The HTK book (for HTK Version 3.4)[0]. Cam-bridge: Engineering Department of Cambridge University. 2009.
  • 3Rabiner L.Juang Biing-Hwang. Fundamentals of speech rec-ognition [ M]. Beijing: Tsinghua University Press Copy. 1993.
  • 4Gao Sheng. Xu Bo , Huang Tai-yi. Chinese triphone model based on acoustics decision tree[n. ACT A Acustica , 2000. 25(6) :505-509. (in Chinese).
  • 5International Phonetic Association. Handbook of the interna-tional phonetic association [M].Jiang Di , translation. Shanghai: Shanghai Educational Publishing House. 2008. (in Chinese).
  • 6Zheng Fang. Wu Wen-hu , Fang Di-rang. Recognition key-word research of continuous stream voice[CJ II Proc of the 4th National Conference on Human Machine Speech Commu-nication Proceedings. 1996: 1. (in Chinese).
  • 7Stolcke A. Srilm-An extensible language modeling toolkit [CJ II Proc of the 7th International Conference on Spoken Language Processing. 2002:257-286.
  • 8高升,徐波,黄泰翼.基于决策树的汉语三音子模型[J].声学学报,2000,25(6):504-509. 被引量:20
  • 9国际语音学会.国际语音学会手册[M].上海:上海教育出版社,2008.
  • 10郑方,吴文虎,方棣棠.连续无限制语音流中关键词识别的研究现状[C].第四届全国人机语音通讯学术会议论文集,1996.


  • 1林焘 王理嘉.语音学教程[M].北京:北京大学出版社,..
  • 2徐波 张亮 等.基于决策树方法的语境有关HMM建模.第八届全国声学学术会议[M].,1998.421-424.
  • 3Hwang Meiyuh,IEEE Trans Speech Audio Processing,1998年,4卷,6期,412页
  • 4徐波,第八届全国声学学术会议,1998年,421页
  • 5Ma Bin,ICASSP ’96,USA,1996年
  • 6林杰,语音学教程












使用帮助 返回顶部