期刊文献+

一个面向广播语音识别的语言模型自适应框架

A Unified Language Model Adaptation Framework for Chinese Broadcast News Recognition
下载PDF
导出
摘要 语言模型自适应的目的是减小模型与识别任务之间的语言差异。这些差异包括词典差异、风格和内容差异以及模型的概率分布差异。本文提出一种新的非迭代的中文新词提取方法和一种新的开放式词典的中文语言模型。基于这些技术,本文提出一个面向广播语音识别的语言模型自适应框架,该框架联合了以下技术:一种新的非迭代的新词提取方法,一种新的中文开放式词典语言模型,一种基于困惑度(PPL)的背景语料筛选方法和一个N-gram概率分布自适应模块。另外,本文还专门分析了在语言模型自适应过程中命名实体词的识别情况。实验表明,通过使用该框架,误识率相对下降了10%,实体词识别准确率提高了4%。 The purpose of language model (LM) adaptation is to reduce the linguistic mismatches between training corpus and recognition tasks. This paper proposed a new non-iterative new words extraction approach for Chinese and a novel open-vocabulary Chinese LM. To reduce lexicon mismatch, topic and stylc mismatch and n gram distribution mismatch, we also present a unified LM adaptation framework which combines our non-iterative new words extraction approach, a novel open-vocabulary Chinese LM, a perplexity-based corpus selection approach and an ngram distribution adaptation module. The recognition accuracy of name entity words is also analyzed as an effect of LM adaptation. Experiments showed about 10% relative character error rate reduction and 4% (absolute) recognition accuracy increase of name entity words.
出处 《中文信息学报》 CSCD 北大核心 2007年第4期73-79,共7页 Journal of Chinese Information Processing
基金 国家863计划资助项目(2006AA010103)
关键词 计算机应用 中文信息处理 语言模型自适应 新词提取 开放式词典 computer application chinese information processing language model adaptation new words extraction open-vocabulary LM
  • 相关文献

参考文献14

  • 1R.Rosenfeld.Optimizing Lexical and Ngram Coverage Via Judicious Use of Linguistic Data[A].Eurospeech[C].September,1995.
  • 2M.Federico,N.Bertoldi.Broadcast News LM Adaptation over Time[J].Computer Speech and Language.October,2004.18(4):417-435.
  • 3R.Rosenfeld.Two decades of statistical language modeling:Where do we go from here[A].In:Proceedings of IEEE,88(8)[C].2000.
  • 4R.Iyer,M.Ostendorf.Relevance weighting for combining multi-domain data for n-gram language modeling[J].Computer Speech and Lang,1999.13:267-282.
  • 5J.R.Bellergarda.An Overview of Statistical Language Model Adaptation[A].In:ITRW on Adaptation Methods for Speech Recognition[C].2001.165-174.
  • 6Pi-Chuan Chang,Shuo-Peng Liao,Lin-shan Lee.Improved Chinese Broadcast News Transcription by Language Modeling with Temporally Consistent Training Corpora and Iterative Phrase Extraction[A].Eurospeech[C].Aug.2003.421-424.
  • 7任纪生,王作英.一种新的基于主题的语言模型自适应方法[J].中文信息学报,2006,20(4):82-87. 被引量:3
  • 8吴根清,郑方,金凌,吴文虎.一种在线递增式语言模型自适应方法[J].中文信息学报,2002,16(1):60-65. 被引量:4
  • 9曲卫民,张俊林,孙乐,孙玉芳.基于记忆的自适应汉语语言模型的研究[J].中文信息学报,2003,17(5):13-18. 被引量:2
  • 10M.Yamamoto and K.Church.Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus[A].In:Proceeding of the 6th Workshop on Very Large Corpora[C].1998.

二级参考文献19

  • 1王作英.基于段长分布的HMM语音识别模型[A]..第二届全国汉字?汉语识别会议[C].庐山,1989..
  • 2[1]S.M. Katz, Estimation of probabilities from sparse data for the language model component of a speech recognizer, IEEE Trans. on Acoustics, Speech, and Signal Processing, 1987,35 (3): 400-401
  • 3[2]M. Federico, Efficient language model adaptation through MDI estimation. Eurospeech' 99,1999,4: 1583-1586
  • 4[3]R. Rosenfeld. A maximum entropy approach to adaptive statistical language model, Computer, Speech, and Language, 10,1996
  • 5[4]H. Masataki, Y. Sagisaka, T. Kawahara, Task adaptation using MAP estimation in n-gram language modeling, ICASSP' 97,1997,783-786
  • 6Ronald Rosenfeld, Two decades of statistical language modding: Where do we go from here? Proceedings of the IEEE [C], 88(8), 2000.
  • 7DeMori, R., and M. Federico, Language Modal Adsptation, [A]. In Computational Models of Speech Pattern Processing, Keith Pointing (ed.), NATO ASI Series, Springer Verlag, 1999.
  • 8R. Kuhn and R. D. Moil, A cache-based natural language modal for speech reproduction [J].IEEE Transactions on Pattern Analysis and Machine Intelligence, PAM2- 12(6):570- 583, 1990.
  • 9Daniel Gildea and Thomas Hofmann, Topic-based language models using EM. In Proceedings of the 6th European Conference on Speech Communication and Technology(EUROPEANSPEECH) [C], 1999.
  • 10G. Salton, Automatic text processing: The transformation, Analysis, and Retrieval of Information by Computer [M], Addison-Wesley 1989.

共引文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部