期刊文献+

基于统计的锚点词候选集的获取

Acquirement of Anchor Word Candidate Set Based on Statistical Approach
下载PDF
导出
摘要 段对齐是在双语语料库中把各个段和它们的相应译文建立起对应关系,进而为以后的句子级对齐、短语级对齐和词汇级对齐提供资源。它在整个对齐研究中起到一个承上启下的作用。利用锚点词信息完成段对齐是一种常用且有效的方法。锚点词要求数量少准确率高,同时更为重要的是它要求具有可以说明两个段之间有对应关系的相对明显的特征,这就是说并不是文章中的每个词都可以做锚点词,锚点词也不是越多越好,这也决定了获取过程中不能借助字典等辅助信息,而必须使用其它方法。文章提出一种新的锚点词候选集的获取方法,通过统计和相似计算来得到锚点词的候选集信息。通过控制统计串的出现频数和相似度的大小就可得到令人满意的可用的结果。实验结果表明,当取高阈值且高相似度时,就可得到很高的准确率。因而该方法是一种获取锚点词的有效方法。 Paragraph alignment is to set a parallel relationship between a paragraph and its corresponding translation paragraph.And then it can use them to accomplish the alignment of different levels,such as sentence alignment ,word alignment and phase alignment.It plays an import role in the study of alignment and it is a connecting link between the preceding and the following process of alignment.Paragraph alignment based on anchor word is a useful and com-mon method.It is required lower number but high precision rate to be anchor words,what's the important ,the anchor word must have an obvious character to indicate a parallel relationship between two paragraph.Not all words in the text can be anchor words.Also it can't say:the higher number there is,the better people will think it will be.So it can't use such assistant resource as dictionary.People must think another way.Here this paper present s a useful method to acquire the anchor word candidate set based on statistical approach.In this method,it uses statistical and similarity computation to accomplish the acquirement of anchor word candidate set.It can get a satisfying result by controlling the frequency of statistical strings and their similarity.Experiment shows that it can get a high precision rate if set a high threshold and high similarity.So it can say this is a useful method to get anchor word candidate set.
出处 《计算机工程与应用》 CSCD 北大核心 2003年第32期55-57,80,共4页 Computer Engineering and Applications
基金 国家自然科学基金资助项目(编号:60083006) 国家973基础规划项目资助(编号:G19980305011)
关键词 双语语料库 锚点词 子串归并 向量 相似度 Bilingual corpora,Anchor word,Substring reduction,Vector,Similarity
  • 相关文献

参考文献2

二级参考文献1

  • 1Wu D,Machine Translation,1995年,9卷,3/4期,285页

共引文献27

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部