期刊文献+

基于双语语料库的翻译等价对自动抽取 被引量:8

Automatic Extraction of Translational Equivalence Based on Bilingual Corpora
下载PDF
导出
摘要 提出了一种利用双语语料库自动抽取多词翻译等价对的方法。首先利用N-gram模型获得候选翻译单元,然后根据统计同现计算候选等价对的翻译概率,并用贪心策略实现翻译等价对的自动抽取。在翻译概率的计算中对3种常用的统计同现测度进行了比较。实验表明,当语料规模较小时,对数似然比(Log Likelihood Ratio)测度对于翻译等价对的抽取具有较好的效果。与现有方法相比,该方法较好地解决了翻译等价对抽取中多词单元对应及间接相关问题。 This paper describes a method to acquire multi-word translational equivalences from English-Chinese parallel corpora. Translation candidates are firstly obtained using N-gram model. Then, an iterative algorithm is used to extract translation equivalences according to statistical translation measures. Three statistical translation measures: Dice coefficient, Phi-Square Coefficient and Log Likelihood Ratio are compared in experiments and it is proved that Log Likelihood Ratio works better when training corpus is small. Compared with previous works, the proposed method solves the difficulty of multi-word unit correspondences and the problem of indirect association. Experiments on real corpus produced very promising results.
出处 《高技术通讯》 EI CAS CSCD 2003年第5期19-24,共6页 Chinese High Technology Letters
基金 863计划(2001AA114101)资助项目。
关键词 双语语料库 自动抽取 N-GRAM模型 翻译概率 计算机 知识获取 候选翻译单元 Bilingual corpora, Translational equivalence, N-gram, Knowledge acquisition
  • 相关文献

参考文献6

  • 1Wu D K, Xia X Y. Learning an english chinese lexicon from a parallel corpus. In: Proceedings of the 1st Conference of the Association for Machine Translation in the American, 1994. 206.
  • 2Gale W, Church K. Identifying word correspondences in parallel texts. In: Proceedings of the 4th DARPA Workshop on Speech and Natural Language, 1991. 152.
  • 3Fung P. A statistical view on bilingual lexicon extraction:from parallel corpora to non-parallel corpora. In: Proceeding of AMTA-98 Conference, Machine Translation and the Information Soup Pennsylvania, 1998.1.
  • 4Melamed D. Computational Linguistics, 2000, 26 (2) :221.
  • 5Yamamoto K, Matsumoto Y, Kitamura M. A comparative study on translation units for bilingual lexicon extraction.In: Proceedings of ACL-2001 Workshop on Data-Driven Methods in Machine Translation, 2001.87.
  • 6DunrfingT. Computational Linguistics, 1993,19(1):61.

同被引文献88

引证文献8

二级引证文献43

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部