期刊文献+

利用句法短语改善统计机器翻译性能 被引量:5

An Improved Syntactic Phrase Extraction Approach for Statistical Machine Translation
下载PDF
导出
摘要 短语表是基于短语的统计机器翻译系统的一个核心组成部分,基于启发式方法抽取到的短语表受单词对齐错误和未对齐词的影响严重,同时抽取到的短语也并非句法意义上的短语。该文提出一种基于EM(Expectation-maximization)算法的双语句法短语抽取方法来抽取双语句法短语,此方法可以通过不断迭代的方式使各参数值达到最优。通过加入双语句法短语、增加新特征、重新训练三种不同的方法,将获得的双语句法短语与基于短语的统计机器翻译方法结合以提高统计机器翻译系统的性能。结果表明:三种方法都不同程度提高了译文的BLEU(BiLingual Evaluation Understudy)值,其中增加新特征方法提高了0.64个点。 The phrase table lies at the core of a phrase-based statistical machine translation system. The extracted phrase table based on heuristic methods is affected by incorrect word alignments, the unaligned words, and the absence of syntactic information. This paper presents a bilingual syntactic phrases extraction method based on the Expectation-maximization algorithm,which can optimize all parameters by iteratiions. Three techniques are examined to integrate bilingual syntactic phrases to the phrase-based machine translation System: direct augmentation of bilingual phrass,adding new features and re-training. Experiments show that all the three methods improve the BLEU score to varying degrees,with the top increase of 0.64 BLEU score by adding new features.
出处 《中文信息学报》 CSCD 北大核心 2015年第2期95-102,共8页 Journal of Chinese Information Processing
基金 跨语言信息检索中的机器翻译研究(61173100 61173101 61272375)
关键词 统计机器翻译 EM算法 双语句法短语 statistical machine tranglation Expectation-maximization algorithm bilingual syntactic phrases
  • 相关文献

参考文献14

  • 1Koehn P,Och F J,Marcu D. Statistical Phrase-basedTranslation[C]//Proceedings of the Human LanguageTechnology and North American Association for Com-putational Linguistics Conference. Edmonton, Alberta.2003:127-133.
  • 2Hailong Cao,Andrew Finch, Eiichiro Sumita. Syntac-tic Constraints on Phrase Extraction for Phrase-BasedMachine Translation [ C]//Proceedings of SSST-4,Fourth Workshop on Syntax and Structure in Statisti-cal Translation, COLING 2010. Beijing. 2010 : 28-33.
  • 3Yang Liu, Qun liu, Shouxun Lin. Tree-to-String A-lignment Template for Statistical Machine Translation[C]//Proceedings of the 21st International Conferenceon Computational Linguistics and the 44th AnnualMeeting of the Association for Computational Linguis-tics. Stroudsburg, PA, USA. 2006 :609-616.
  • 4Yamada K, Knight K. A Syntax-Based StatisticalTranslation Model [C]//Proceedings of the 39th Annu-al Meeting of the Association for Computational Lin-guistics. Toulouse,France. 2001:523-530.
  • 5Quirk C,Menezes A,Herry C. Dependency TreeletTranslation: Syntactically Information Phrasal SMT[C]//Proceedings of the 43rd Annual Meeting of theAssociation for Computational Linguistics. Ann Ar-bor. 2005:271-279..
  • 6刘冬明,赵军,杨尔弘.汉英双语语料库中名词短语的自动对应[J].中文信息学报,2003,17(5):6-12. 被引量:7
  • 7Imamura K. Hierarchical phrase alignment harmonizedwith parsing [C]//Proceedings of Six Natural Lan-guage Processing Pacific Rim Symposium. Tokyo.2001:377-384.
  • 8Jinxi Xu, Jinying Chen. How Much Can We Gain fromSupervised Word Alignment. [C]//Proceedings of the49th Annual Meeting of the Association for Computa-tional Linguistics. Portland, Oregon. 2011: 165-169.
  • 9何彦青,周玉,宗成庆,王霞.基于“松弛尺度”的短语翻译对抽取方法[J].中文信息学报,2007,21(5):91-95. 被引量:6
  • 10Boxing Chen, Roland Kuhn, George Foster, et al.Unpacking and Transforming Feature Functions:New Ways to Smooth Phrase Tables [Cj//Proceed-ings of the MT Summit X HI : the Thirteenth Ma-chine Translation Summit. Xiamen, China. 2011 :269-275.

二级参考文献18

  • 1周强,俞士汶.汉语短语标注标记集的确定[J].中文信息学报,1996,10(4):1-11. 被引量:35
  • 2Xun E, ghou M, and Huang C. A Unified Statistical Modal for the Identification of English Base NP.The 38th Annual Meeting of the Association for Computational Linguistics [C], 2002.
  • 3Lance A. Ramshaw and Mitchell P. Marcus. Text Chunking Using Transformation-Based Learning.Proceedings of the Third ACL Workshop on Very Large Corpora [C], Cambridge MA, USA, 1995.
  • 4Jlian M. Kupiec. An Algorithm for Finding Noun Phrase Correspondences in Bilingual Corpora. Proceedings of the 3Ist Annual Meeting of the ACL [ C] ,1993.
  • 5Smadja F, McKeown K. R and Hatzivassiloglou V. Translation Collocations for Bilingual Lexicons: A Statistical Approach [J] Computational Linguistics 1996,22(1) : 1 - 38.
  • 6Melamed I. D. Automatic Discovery of Non-Compositional Compounds. Proceedings of the 2nd Conference on Empirical Methods in Natural Language Processing [C], Providence, RI 1997.
  • 7Jianfeng Gao, Jian-Yun Nie. Improving Query Translation for Cross-language Information Retrieval Using Statistical Models Proceedings of the 24th annual international ACMSIGIR conference [C] 96 - 104,2001.
  • 8Daniel Marcu and William Wong.A Phrase-based,Joint Probability Model for Statistical Machine Translation[A].In:Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)[C].Philadelphia,PA,USA.July 2002.
  • 9Dekai WU.Stochastic inversion transduction grammars and bilingual parsing of parallel corpora[J].Computational Linguistics 1997.23(3):377-404.
  • 10Ying Zhang,Stephan Vogel,and Alex Waibel.Integrated phrase segmentation and alignment algorithm for statistical machine translation[A].In:Proceeding of International Conference on Natural Language Processing and Knowledge Engineering[C].Beijing:2003.

共引文献10

同被引文献54

引证文献5

二级引证文献15

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部