摘要
对双语术语抽取技术中的一项重要分支:基于可比语料库的双语术语抽取技术进行了综述分析。当前研究者采用的方法依据是“上下文相似”理论,即两个在源语言共现的词,对应到目标语言中的两个词也将共现。当前技术主要包含候选词的上下文特征的模型构造和上下文特征模型的优化。对已有的研究给出了一个初步的评价标准,分别对两项研究按照方法复杂度层次进行分析总结,指出存在的问题。最后对基于可比语料库的双语术语抽取技术的未来进行了展望。
This article gives a research survey on the bilingual term extraction based on comparable corpora, which is a branch of bilingual term extraction. Most researchers use the Context-similar theory, which claims that if two words appear nearly in the source text then their translations could appear nearly in the target text. The bilingual term extraction based on comparable corpora includes two tasks: the context features models and the optimization of the context features models. The status of this technology has been analyzed in detail by the generation of method. And the problems have been discussed during analyzing. In the end, the paper presents the prospects of the study of the bilingual term extraction based on comparable corpora. According to these researchers' experiment result, this technology can be used in machine aided translation and building bilingual dictionary.
出处
《情报学报》
CSSCI
北大核心
2011年第12期1286-1292,共7页
Journal of the China Society for Scientific and Technical Information
基金
本文为国家“863”高新技术研究发展计划基金项目,项目编号2006AA010109.
关键词
基于可比语料库的双语术语抽取
双语语料库
可比语料库
上下文特征
bilingual term extraction based on comparable corpora, bilingual corpora, comparable corpora, context features