摘要
双语语料库在机器翻译或机器辅助翻译研究中的重要作用已经越来越多地得到研究人员的认可。本文探讨了如何利用汉英双语语料进行汉英翻译等价单位的抽取,提出了基于词语关联度进行多词组合单位的识别方法,并利用假设-检验的方法,在汉英双语语料库中抽取翻译等价单位。本文还对不同的关联度量方法进行了对比,并提出利用范畴假设改进抽取算法的效率。
More and more researchers have recognized the potential value of the parallel corpus in the research on Machine Translation and Machine Aided Transl ation. This paper examines how the translation equivalent pairs could be extract ed from parallel corpus. An iterative algorithm based on degree of word associat ion is proposed to identify the multiword units for Chinese and English. Then a hypothesis-testing approach is used to extract the Chinese-English Translation Equivalent Pairs. We also made comparison between different statistical associa tion measurement and proposed to use categorical hypothesis to improve the perfo rmance of extraction.
出处
《术语标准化与信息技术》
2002年第2期24-29,共6页
Terminology Standardization & Information Technology
关键词
英语
汉语
双语语料库
翻译等价单位
自动抽取
bilingual corpus, translation equivalent pair, automatic extraction of TEPs