期刊文献+

基于短语模糊匹配和句子扩展的统计翻译方法 被引量:4

Approach to Statistical Machine Translation Based on Phrase Fuzzy-Matching and Sentence Expansion
下载PDF
导出
摘要 近几年来,基于短语的统计翻译模型在机器翻译研究中受到普遍关注,并取得了较好的翻译性能。但是,由于目前基于短语的翻译系统在解码时采用精确匹配的策略,常常导致数据稀疏,一方面,有些短语在训练获得的短语表中找不到精确的匹配,使其成为未知短语;另一方面,短语表中大量的短语无法得到充分的利用。为此,我们提出了基于短语模糊匹配和句子扩展的翻译方法。对于不存在于短语表中的短语,通过模糊匹配的办法,寻找与其相似的短语,然后将所有相似短语用于替换原短语,从而生成扩展句子,在此基础上对所有扩展的句子进行翻译。由于并不是所有扩展后的句子都能提高原始句子的翻译效果,因此,我们在句子翻译完成后设置了组合分类器用于选择最优翻译结果。实验证明,这种方法可以有效地提高翻译系统的译文质量。 In recent years, the phrase based statistical machine translation model has obtained more attention for its good translation performance. However, the model uses the strategy of precise matching in decoding, and the data sparseness becomes a serious problem. On the one hand, some phrases become the "unknown phrases" because they cannot be matched precisely in the phrase table; On the other hand, most of the phrases in the phrase table can't be used in the translation process. Therefore, we propose a novel translation approach based on phrase fuzzy matching and sentence expansion. In our approach, for a phrase out of the phrase table, i.e. unknown phrase, we find its similar phrase in the phrase table through fuzzy matching. Then the sentence is expanded by replacing the original phrase with the similar ones before being translated into the target language. Finally, a combination of multi-classifier is employed to select the best translation. The experiment results show that this approach significantly improves the translation quality.
作者 刘鹏 宗成庆
出处 《中文信息学报》 CSCD 北大核心 2009年第5期40-46,共7页 Journal of Chinese Information Processing
基金 国家自然科学基金资助项目(60575043 60736014) 国家863计划资助项目(2006AA01Z194 2006AA010108)
关键词 人工智能 机器翻译 基于短语的统计机器翻译 模糊匹配 组合分类器 artificial intelligence machine translation phrase-based statistical machine translation fuzzy matching combination classifier
  • 相关文献

参考文献14

  • 1Philipp Koehn,Franz J.Och,Daniel Mareu.Statistical phrase-based translation[C]//proc.of NAACL,Edmonton,Canada,2003:48-54.
  • 2宗成庆,吴华,黄泰翼,等.限定领域汉语口语对话语料分析[C]//全国第五届计算语言学联合学术会议论文集.北京:清华大学出版社,1999,115-122.
  • 3F.J.Och,H.Ney.The Alignment Template Approach to Statistical Machine Translation[J].Compu tational Linguistics,2004,30(4):417-449.
  • 4何中军,刘群,林守勋.基于短语相似度的统计机器翻译模型[C]//第三届中国统计机器翻译研讨会论文集.哈尔滨:哈尔滨工业大学,2007:52-59.
  • 5董振东.知网[CP/OL].http://www.keenage.com.
  • 6刘群 李素建.基于《知网》的词汇语义相似度计算[C]..第三界汉语词汇语义研讨会[C].台北,2002..
  • 7Damerau F J.A Technique for Computer Detection and Correction of Spelling Errors[J].Communications of the Association for Computing Machinery,1964,7 (3):171-176.
  • 8V.Vapnik.The Nature of Statistical Learning Theory[M].Berlin:Springer,1995.
  • 9C.-C.Chang,C.-J.Lin.LIBSVM[CP/OL].http:// www.csie.ntu.edu.tw/-cjtin/libsvm.
  • 10Pang-Ning Tan,Michael Steinbaeh,Vipin Kumar.Introduction to Data Mining[M].Addison Wesley:2005.

共引文献105

同被引文献55

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部