期刊文献+

基于受限语料库的语言平滑算法比较研究

Comparative Study on Algorithms of Limited Corpus Language Model
下载PDF
导出
摘要 随着网络的广泛应用和科技的高速发展,人们所接收信息急剧增加,机器翻译面临强大的市场需求。从现存文本资料中提取语言模型,是整个机器翻译系统的重点,决定了翻译系统的性能表现。用于特定领域的文本翻译系统,往往受到相关文本缺少的困扰,无法通过大规模语料库的建设来训练语言模型,由此而产生了严重的数据稀疏问题。通过实验研究了受限语料库下语言模型平滑算法的选择。实验结论表明,在语料库极度受限的情况下,Good-Turing能够发挥其低频词汇重估优势,良好解决训练语料库的数据稀疏问题。通过该方法,可以提高在语料受限条件下语言模型的性能。 In recent years,with the rapid development of science and technology and the widespread application of Internet,information increases dramatically.Training language model from corpus plays an important role in improving system performance For specific areas translation task,it is often plagued by the lack of relevant texts,fail to construction of large-scale corpus to train the language model,resulting in serious data sparse problem.This paper focuses on choosing smoothing algorithms under limited corpus language model.Through several comparative experiments,it can be concluded that Good-Turing method can leverage its low-frequency lexical revaluation advantage,and solve the problem caused by data sparse efficiently,and also improve the efficiency of language model under limited corpus.
出处 《微型电脑应用》 2010年第12期18-20,1,共3页 Microcomputer Applications
基金 国家自然科学基金(60574063)项目基金资助项目
关键词 自然语言处理 受限语料库 语言模型 数据稀疏 Natural Language Processing Limited Corpus Language Model Data Sparse
  • 相关文献

参考文献2

二级参考文献19

  • 1俞士汶等.机器翻译译文质量自动评估系统[A]..中国中文信息学会1991年会论文集[C].,.314—319.
  • 2Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jelinek, John D. Lafferty, Robert L. Mercer, Paul S. Roossin, A Statistical Approach to Machine Translation [J],Computational Linguistics, 1990.
  • 3Peter. F. Brown, Stephen A. Della Pietra, Vincent J. Della Pietra, Robert L. Mercer, The Mathematics of Statistical Machine Translation: Parameter Estimation [J], Computational Linguiatics, 19,(2), 1993.
  • 4F. J. Och, C. Tillmann, and H. Ney. Improved alignment models for statistical machine translation[A]. In Proc. of the Joint SIGDAT Conf. On Empirical Methods in Natural Language Processing and Very Large Corpora, pages 20-28, University of Maryland, College Park, MD, June 1999.
  • 5Franz Josef Och, Hermann Ney. What Can Machine Translation Learn from Speech Recognition? [A]In: proceedings of MT 2001 Workshop: Towards a Road Map for MT, 26-31, Santiago de Compostels,Spain, September 2001.
  • 6Franz Josef Och, Hermann Ney, Discriminative Training and Maximum Entropy Models for Statistical Machine Translation [A], ACL2002.
  • 7K. A. Papineni, S. Roukos, and R. T. Ward. Feature-based language understanding[A]. In European Conf. on Speech Communication and Technology, 1435-1438, Rhodes, Greece, September,1997.
  • 8K. A. Papineni, S. Roukos, and R. T. Ward. Maximum likelihood and discriminative training of direct translation models [A] In Proc. Int. Conf. on Accoustics, Speech, and Signal Processing,pages,189-192, Seattle, WA, May, 1998.
  • 9Kishore Papineni, Salim Roukos, Todd Ward, Wei-Jing Zhu, Bleu: a Method for Automatic Evaluation of Machine Translation [R], IBM Research, RC22176 (W0109-022) September 17, 2001.
  • 10Ye-Yi Wang, Grammar Inference and Statistical Machine Translation [D], Ph.D Thesis, Carnegie Mellon University, 1998.

共引文献74

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部