期刊文献+

统计机器翻译中汉维短语对抽取的研究 被引量:4

Research on Chinese-Uyghur Phrase Pairs Extraction in Statistical Machine Translation
下载PDF
导出
摘要 双语短语对抽取是基于短语的统计机器翻译中短语翻译模型训练的关键步骤,但由于汉维平行语料库规模有限,数据稀疏问题严重.本文提出了一种改进的短语抽取算法,该算法首先考虑词对齐矩阵中一个汉语词对齐到多个维吾尔语词的情况(包括不连续),然后利用Och方法抽取短语对,最后考虑维吾尔语SOV语序结构特点,抽取双语短语.实验表明,该算法能够较准确地且尽可能多地抽取汉维短语对,从而提高翻译模型的质量. Bilingual phrases pairs extraction is a key step that training phrase translation model in the phrase-based statistical machine translation, however, due to the limited size of bilingual parallel corpora, the sparse data problem is very serious. Improved approach of phrases extraction was proposed, firstly this algorithm considers a Chinese word to multi-Uyghur words (including nonconsecutive), and it also extracts phrases pairs using Och's method, in the end we extracts phrases considering SOV sentence structure in Uyghur. Experiments show that the algorithm can extract bilingual phrases translation pairs accurately at the same time extract phrases as much as possible. So it improves the quality of the translation model.
出处 《新疆大学学报(自然科学版)》 CAS 2010年第3期349-352,共4页 Journal of Xinjiang University(Natural Science Edition)
基金 国家自然科学基金项目(60663006 60763006)
关键词 统计机器翻译 短语抽取 汉维短语对 Statistical Machine Translation phrase extraction Chinese-Uyghur phrase pairs
  • 相关文献

参考文献8

  • 1Richard Zens, Franz Josef Och, Hermann Ney. Phrase-Based Statistical Machine Translation[C]. In: Proc.German Conference on Artificial Intelligence (KI 2002), 2002, 18-32.
  • 2Stephan Vogel. The CMU statistical machine translation system[C].in:Proc, of the Machine Translation Summit IX, New Orleans, LA, 2003.
  • 3Franz Josef Och, Hermann Ney. The alignment template approach to statistical machine translation[J]. ComputationalLinguistics, 2004, 30(40):412-449.
  • 4Stephan Vogel. PESA:Phrase pair extraction as sentence splitting[C], in:Proc, of the Machine Translation Summit X, Phuket, Thailand, September 2005.
  • 5Bing Zhao, Stephan Vogel. A generalized alignment-free phrase extraction[C], in:Proceedings of the ACL Workshop on Building and Using Parallel Texts, 2005, 141-144.
  • 6何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5):91-95. 被引量:6
  • 7强静,张建.基于短语的统计机器翻译中短语抽取算法改进[J].计算机工程与应用,2008,44(13):147-149. 被引量:3
  • 8Philipp Koehn, Franz Josef Och, Daniel Marcu. Statistical phrase-based translation[C]. In: Proceedings of the Human Language Technology Conference (HLT-NAACL), 2003, 127-133.

二级参考文献17

  • 1Brown,Cocke,Pietra D,et al.A statistical approach to machine translation[J].Computational Linguistics, 1990,16(2):79-85.
  • 2Och,Ney.Discriminative training and maximum entropy models for statistical machine translation[C]//Proc of the 40th Annual Meeting of the Association for Computational Linguistics(ACL),2002.
  • 3Och F J,Ney H.A systematic comparison of various statistical alignment models[J].Computational Linguistics, 2003,29( 1 ) : 19-51.
  • 4Tillmann,Ney H.Word reordering and a dynamic programming beam search algorithm for statistical machine translation[J].Computational Linguistics, 2003,29( 1 ) : 97-133.
  • 5Stolke A.Srilm-an extensible language modeling toolkit[C]//Proceedings of the International Conference on Spoken Language Processing.
  • 6Och F J.Statistical Machine Translation:From Single-Word Models to Alignment Templates[D].Computer Science Department,RWTH Aachen, Germany, 2002-10.
  • 7Och F J.Statistical machine translation:from single-word models to alignment templates[D].Computer Science Department, RWTH Aachen, Germany, 2002-10.
  • 8Cenugopal A.Vogel S,Vaibel A.Effective phrase translation extraction from alignment models [C]//Proceedings of the 1st Annual Meet ing of the Association of Computational Linguistics(ACL),2003.
  • 9Daniel Marcu and William Wong.A Phrase-based,Joint Probability Model for Statistical Machine Translation[A].In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)[C].Philadelphia,PA,USA.July 2002.
  • 10Dekai WU.Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J].Computational Linguistics 1997.23(3):377-404.

共引文献6

同被引文献67

引证文献4

二级引证文献6

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部