期刊文献+

维吾尔语语音识别语料库中的OOV研究 被引量:4

Research on OOV problem in constructing Uyghur speech corpus
下载PDF
导出
摘要 鉴于维吾尔语丰富的形态变化产生大量单词引起的集外词(out of vocabulary,OOV)问题,为了定量研究OOV对维吾尔语语音识别的影响,采用控制语料库测试集OOV的算法及最佳文本挑选算法对不同OOV的测试集进行实验,算法通过Python语言实现。应用该算法进行电话语音库的文本转写,构建了维吾尔语的电话语音库。实验结果表明,该控制测试集OOV的方法能够有效地提高维吾尔语语音识别率。 A serious problem of OOV(out of vocabulary) is produced by abundant morphology of Uyghur which has created a large number of words.To quantify the effect on speech recognition brought by OOV,based on Python programming language,an algorithm that can control OOV rate of test sets in Uyghur speech corpus and an algorithm that can select optimal text are proposed.Using these algorithms,telephone speech database of Uyghur is conducted.The experimental results demonstrate that controlling OOV rate of test sets can increase rate of Uyghur speech recognition.
出处 《计算机工程与设计》 CSCD 北大核心 2012年第2期772-776,共5页 Computer Engineering and Design
基金 中国科学院"西部行动计划高新技术基金项目"(KGCX2-YW-507)
关键词 维吾尔语 集外词 语料库 文本挑选 语音识别 Uyghur OOV corpus text selection speech recognition
  • 相关文献

参考文献7

二级参考文献49

共引文献36

同被引文献41

  • 1古丽拉.阿东别克,米吉提.阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65. 被引量:39
  • 2蔡琴,吾守尔.斯拉木.基于HTK的维吾尔语连续数字语音识别[J].现代计算机,2007,13(4):14-16. 被引量:7
  • 3Arisoy E, Dutagaci H, Arslan L M. A unified language model for large vocabulary continuous speech recognition of Turkish[J]. Signal Processing, 2006, 86( 10): 2844-2862.
  • 4Tanel A. Phonological and morphological modeling in large vocabulary continuous Estonian speech recognition system [C]//Proceedings of Second Baltic Conference on Human Language Technologies. Tallinn, Estonia, 2005: 89- 94.
  • 5Creutz M, Lagus K. Unsupervised models for morpheme segmentation and morphology learning [J]. ACM Transactions on Speech and Language Processing, 2007, 4(1) : 3 - 36.
  • 6Creutz M, Hirsimfiki T, Kurimo M, et al. Analysis of morph-based speech recognition and the modeling of out of vocabulary words across languages [C]// Proceedings of NAACL HLT. Rochester, NY, USA, 2007: 380-387.
  • 7Hirsimaki T, Pylkkanen J, Kurimo M, et al. Importance of high order N-gram models in morph-based speech recognition [J]. IEEE Tra72sactions on Audio, Speech and Language Processing, 2009, 17(4):724-732.
  • 8Ablimit M, Neubig G, Mimura M, et alo Uyghur morpheme-based language models and ASR [C]// Proc 10th IEEE Conf ICSP. Beiiing, China: IEEE Press, 2010:581 - 584.
  • 9Creutz M, Hirsimaki T, Kurimo M, et al. Morph-based speech recognition and modeling of out-of-vocabulary words across languages [J]. ACM Transactions on Speech and Language Processing, 2007, 5(1) : 1 - 29.
  • 10Goodman J T. A bit of progress in language modeling [J]. Computer Speech and Language, 2001, 15(4) : 403 - 434.

引证文献4

二级引证文献21

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部