期刊文献+

借助音频数据的发音字典新词学习方法 被引量:1

Learning New Words for Pronunciation Lexicon from Audio Data
下载PDF
导出
摘要 针对已有的发音字典扩展方法只能从文本数据中学习新词而无法学习到音频数据中新词的问题,提出了一种基于混合语音识别系统的发音字典新词学习方法。该方法首先分别采用音节和字母音素对混合识别系统对音频数据进行集外词识别,利用系统间的互补性得到尽可能多的新词及其发音候选,然后借助感知器与最大熵模型对得到的新词及发音进行优化,降低错误率,最后实现发音字典的扩展,并利用语法语义信息完成对语言模型参数更新。基于华尔街日报(WSJ)语料库的连续语音识别实验表明:该方法可以有效学习到音频数据中的未知新词,采取的数据优化策略极大地提高了所得新词及发音的精度;在词错误率指标下,字典扩展后系统的识别性能相对基线系统提高约13.4%。 A self-learning method of new pronunciation lexicons based on a hybrid speech recognition system is proposed to solve the problem that the existing self-expanding methods of pronunciation lexicons can only learn new words from text data but cannot learn from audio data.The method utilizes both the syllables and the graphones hybrid systems to recognize the out-ofvocabulary words in the audio data and then obtains as many new words with their pronunciations as possible by using the complementary information of the two systems.Then the new word and its pronunciation candidates are optimized using aperceptron model and a maximum entropy model to reduce the error rate.Finally,the lexicon is expanded and the language model parameters are updated by using syntactic and semantic information.Experimental results of continuous speech recognition on Wall Street Journal speech database show that the proposed method learns new words from audio data effectively,and the accuracy is greatly improved by using the data optimization strategies.The extended lexicon system yields a relative gain of13.4% over the base line system in terms of word error rates.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2016年第6期75-82,共8页 Journal of Xi'an Jiaotong University
基金 国家自然科学基金资助项目(61175017 61403415 61302107)
关键词 语音识别 发音字典 新词学习 集外词 speech recognition pronunciation lexicon new words learning out-of-vocabulary words
  • 相关文献

参考文献15

  • 1DAVEL M, MARTIROSIAN O. Pronunciation diction-nary development in resource-scarce environments [C]∥Proceedings of International Speech Communication Association. Grenoble, France: ISCA, 2009: 2851-2854.
  • 2BISANI M, NEY H. Joint-sequence models for grapheme-to-phoneme conversion[J].Speech Communication, 2008, 50(5): 434-451.
  • 3RAO K, PENG F, SAK H, et al. Grapheme-to-phoneme conversion using long short-term memory recurrent neural networks [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2015: 4225-4229.
  • 4TIM S, OCHS S, TANJA S. Web-based tools and methods for rapid pronunciation dictionary creation[J].Speech Communication, 2014, 56(1): 101-118.
  • 5BERT R, KRIS D, MARTENS J. An improved two-stage mixed language model approach for handling out-of-vocabulary words in large vocabulary continuous speech recognition[J].Computer Speech and Language, 2014, 28(1): 141-162.
  • 6郑铁然,韩纪庆,李海洋.基于词片的语言模型及在汉语语音检索中的应用[J].通信学报,2009,30(3):84-88. 被引量:5
  • 7HE Y Z, BRIAN H, PRTER B. Subword-based modeling for handling OOV words in keyword spotting [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2014: 7914-7918.
  • 8QIN L, RUDNICKY A I. OOV word detection using hybrid models with mixed types of fragments [C]∥Proceedings of International Speech Communication Association. Grenoble, France: ISCA, 2012: 2450-2453.
  • 9BASHA S, AMR M, HAHN S. Improved strategies for a zero OOV rate LVCSR system [C]∥Proceedings of International Conference on Acoustics, Speech, and Signal Processing. Piscataway, NJ, USA: IEEE, 2015: 5048-5052.
  • 10BLACK A W, TAYLOR P, CALEY R. The festival speech synthesis system [EB/OL]. (2002-12-27)[2016-01-04]. http: ∥wwwfestvoxorg/docs/manual-143/.

二级参考文献25

  • 1ABBERLEY D, RENALS S, COOK G. Retrieval of broadcast news documents with the THISL system[A]. Proc ICASSP98[C]. Seattle, 1998.3781-3784.
  • 2NG K, ZUE V W. Subword-based approaches for spoken document retrieval[J]. Speech Communication, 2000, 32:157-186.
  • 3BETH L, PEDRO M, OM D. Word and subword indexing approaches for reducing the effects of OOV queries on spoken audio[A]. Proc HLT2002[C]. San Diego, 2002.
  • 4WECHSLER M, SCHAUBLE P. Speech retrieval based on automatic indexing[A]. Proc MIRO '95[C]. Glasgow, 1995.
  • 5SEIDE F, et al. Vocabulary independent search in spontaneous speech[A]. Proc ICASSP'04[C]. Montreal, 2004.I253-I256.
  • 6SARACLAR M, SPROAT R. Lattice-based search for spoken utterance retrieval[A]. Proc HLT-NAACL 2004[C]. Boston, Massachusetts, USA, 2004.129-136.
  • 7HORI T, HETHERINGTON I L, HAZEN T J. Open-vocdalaryspoken utterance retrieval using confusion networks[A]. Proc ICASSP'07[C]. Honolulu, HI, USA, 2007.73-76.
  • 8BAI B R, CHEN B L, WANG H M. Syllable based Chinese text/spoken document retrieval[J]. International Journal of Pattern Recognition and Artificial Intelligence, 2000, 14(5): 603-616.
  • 9BAI B R, WANG H M, LEE L S. Discriminating capabilities of syllable based features and approaches of utilizing them for voice retrieval of speech information in mandarin Chinese[J]. IEEE Trans Speech Audio Processing, 2002, 10(5): 303-314.
  • 10BEAUJARD C, JARDINO M. Language modeling based on automatic word concatenations[A]. Proceedings of European Conference on Speech Communication and Technology[C]. Budapest, Hungary, 1999.

共引文献98

同被引文献14

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部