摘要
面对基于双语词典的跨语言检索查询翻译方法中固有的一对多等翻译模糊问题,已有研究成果存在对于非组合型复合词无法进行准确翻译、双语词典和其他翻译资源联合使用引入较大计算开销等弊端。为建立英汉双向跨语言检索实用性系统,在现有的一部包含若干科技词汇和短语的双语科技词典的基础上,着重研究如何引入平行语料来改进已有的双语词典问题。目标是生成一部基于句对齐平行语料的科技类双语概率词典,为跨语言检索查询翻译消歧提供实时性支持。
Machine readable dictionary (MRD) based translation has been widely used for cross language information retrieval (CLIR), but it has some shortcomings, especially when one word maps several translation entries. There are two possible drawbacks for the existing studies. One is no correct translation for non-compositional phrases. The other is larger computation consumption when combining with other translation resources. Provided an English-Chinese bilingual dictionary in scientific and technological fields, a novel method to compute translation probability of MRD is proposed. Parallel corpus is foundation for improving the MRD. The new MRD with probability information would be helpful for real-time query translation disambiguation in CLIR.
出处
《图书情报工作》
CSSCI
北大核心
2011年第20期126-128,114,共4页
Library and Information Service
基金
中国博士后科学基金项目"基于查询分类的跨语言检索查询翻译消歧技术研究"(项目编号:20090450465)
中国科学技术信息研究所2010学科建设项目"自然语言处理"(项目编号:XK2010-6)研究成果之一
关键词
查询翻译
机读词典
句对齐平行语料
query translation machine readable dictionary sentence-aligned parallel corpus