利用语义相似度解决双语词汇知识获取的错误累计问题被引量：1

Resolving error accumulation of automatically acquiring bilingual lexical knowledge by semantic similarity

下载PDF

导出

摘要在利用大规模英汉双语平行语料库进行双向双语翻译词典建设时发现：由于错误累计问题．现有词对齐技术无法直接获取质量较高的双语词汇知识．由此提出一种基于HowNet以及WordNet进行相似度计算，然后设定相似度阈值来进行词义过滤的方法．实验结果表明该方法行之有效．并对HowNet以及WordNet相似度计算方法进行了基于实际应用的对比与探讨后得出：HowNet在语义区分上粒度更细因此其召回率较高，WordNet则具有更高的精确率． While using a large-scale bilingual English-Chinese corpus to build translation dictionary, after some statistics and analysis, it is found that there are some unconquerable error accumulation problems while acquiring bilingual lexical knowledge by using large-scale bilingual corpus. Furthermore, a method is raised to solve this problem using semantic dictionary and its similarity measurement, Primary experiment has indicated that this method is effective and feasible. The application-oriented comparison between HowNet and WordNet has been made in this paper, and a conclusion is drawn： HowNet has higher recall while WordNet has higher precision for their difference of semantic granularity.

作者刘鹏远赵铁军李生杨沭昀

机构地区哈尔滨工业大学计算机学院

出处《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2006年第B07期575-579,共5页 Journal of Harbin Engineering University

基金国家自然科学基金资助项目（60375019）.

关键词词对齐知识获取 HOWNET WORDNET 相似度语义词典错误累计 word alignment knowledge acquire HowNet WordNet similarity semantic dictionary error accumulation

分类号 TP182 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献17

1GALE W A,CHURCH K W.Identifying word correspondences in parallel texts[A].In:Proceedings of the 4th DARPA workshop on Speech and Natual Language[C].[s.l.],1991.
2吕亚娟,双语语料库对齐及翻译知识自动获取技术研究[D].哈尔滨:哈尔滨工业大学,2003.
3DAGAN I,CHURCH K W,GALE W A.Robust bilingual word alignment for machine aided translation[A].Proc.of Workshop on Very Large Corpora[C].[s.l.],1993.
4BROWN P F,COCKE J,PIETRA S A,et al.A statistical approach to machine translation[J].Computational Linguistics,1990,16(2):79-85.
5KUMANO A,HIRAKAWA H.Building an MT dictionary from parallel texts based on linguistic and statistical information[A].In:Proceedings of the 15th International Conference on Computational Linguistics[C].Kyoto,Japan,1994.76-81.
6CHEN A,KISHIDA K.Automatic construction of a Japanese english lexicon and its application in cross-language information retrieval[A].In Joint ACM DLPACM SIGIR Workshop on Multilingual Information Discovery and Access[C].[s.l.],1999.
7杨沐昀,刘晓月,李生.基于汉英双语语料库的汉英词典编撰研究[J].情报学报,2003,22(3):310-314. 被引量：7
8GEORGE M A,RECHARD T B,DEREK G,et al.WordNet:an on-line lexical database[J].International Journal of lexicography,1990,3(4):235-244.
9杜飞龙.知网辟蹊径共享新天地——董振东先生谈知网与知识共享[J].微电脑世界,1999,0(29):11-13. 被引量：3
10RESNIK P.Using information content to evaluate semantic similarity[A] In:Proceedings of the 14th International Joint 1995 Conference on Artificial Intelligence[C].Montreal,1995.

二级参考文献13

1J Nie, M Simard, et al. Cross-language information retrieval based on parallel texts and automatic mining parallel texts from the Web. ACM-SIGIR Conference, Berkeley, California,1999.
2D Lonsdale, E Mitamura, E Nyberg. Acquisition of large lexicons for practical knowledge-based MT. Machine Translation,1995, 9(3) : 101 - 133.
3M Barlow. Parallel texts in language reaching. In: A M McEnery, et al. ed. Corpora and Language Reasearch: A Selection of Papers from Talc96. Lancaster University. 1996.
4W A Gale, K W Church. Identifying word correspondences in parallel texts. Proceedings of the 4th DARPA Workshop on Speech and Natural Language. 1991: 152- 157.
5P F Brown, J Cocke and S A Pietra, et al. A statistical approach to machine translation. Computational Linguistics,1990, 16(2) :79 - 85.
6I Dagan, K W Church and W A Gale. Robust bilingual word alignment for machine aided translation. Proc. of Workshop on Very Large Corpora. 1993 : 1 - 8.
7A Chen, K Kishida, et al. Automatic construction of a japanese-english lexicon and its application in cross-lanague information retrieval. In Joint ACM DIdACM SIGIR Workshop on Muhilingual Information Discovery and Access (MIDAS).
8R C Moore. Towards a simple and accurate statistical approach to learning translation relationships among words. Proceedings of Workshop on Data-driven Machine Translation of 39th ACL and 10th ACL European Chapter. 2001:79 - 86.
9K W Church, P Hanks. Word association norms, mutual information and lexicography. Computational Linguistics, 1991, 16(1).
10T Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 1993, 19:61-74.

共引文献8

1高影繁,王惠临,徐红姣,张均胜,屈鹏.跨语言信息检索研究进展[J].情报学进展,2014(1):275-307.
2李德俊.基于英汉平行语料库的词典编写系统CpsDict的研制[J].现代外语,2006,29(4):371-381. 被引量：14
3任成梅,李春英.汉英跨语言信息检索探讨[J].图书馆理论与实践,2006(6):51-53. 被引量：5
4陈国华,王立欣,梁茂成,刘树杰,许家金.英汉/汉英对译语料库对应词检索器[J].外语电化教学,2006(6):11-16. 被引量：13
5陈爽,陈福,杜天苍.一种启发式网络信息采集系统设计与实现[J].北京石油化工学院学报,2007,15(4):38-42.
6程岚岚.基于正则表达式的大规模网页术语对抽取研究[J].情报杂志,2008,27(11):62-64. 被引量：13
7刘芳.基于平行语料库的中国特色词汇双语词表建设[J].无线互联科技,2015,12(8):67-67. 被引量：1
8陈泽,段友祥.油气领域科技信息查重技术研究与应用[J].计算机与数字工程,2022,50(12):2731-2736.

同被引文献48

1王子颖.法律语篇中shall和may的翻译对比研究[J].上海翻译,2013(4):52-57. 被引量：18
2柯飞.翻译中的隐和显[J].外语教学与研究,2005,37(4):303-307. 被引量：275
3张艳,柏冈秀纪.基于长度的扩展方法的汉英句子对齐[J].中文信息学报,2005,19(5):31-36. 被引量：24
4安纪霞,李锡祚,宋冰,曾伟.服务于词典编纂的特定领域专业术语自动抽取[J].计算机与数字工程,2007,35(11):53-56. 被引量：3
5吴晓昱,王安民.平行语料库与汉英词典编纂的对接[J].译林:学术版,2012(2):173.
6英汉双语平行语料库.检索页面[EB/OL]. http://www.luweixmu.com/ec-corpus/query.asp,2015-11-15.
7北京大学中国语言学研究中心.CCL汉英双语语料库[EB/OL]. http://ccl.pku.edu.cn:8080/ccl_corpus/,2015-11-15.
8北外语料库语言学.语料库语言学年表[EB/OL]. http://www.bfsu-corpus.org/content/chronology-corpus-linguistics-yu-liao-ku-yu-yan-xue-nian-biao,2015-11-15.
9《红楼梦》汉英平行语料库[EB/OL]. http://corpus.usx.edu.cn/hongloumeng/,2015-11-15.
10黄瑾,吕雅娟,刘群.基于信息检索方法的统计翻译系统训练数据选择与优化[J].中文信息学报,2008,22(2):40-46. 被引量：9

引证文献1

1司莉,何依.2000年以来我国多语言语料库研究进展[J].现代情报,2016,36(6):165-170. 被引量：2

二级引证文献2

1葛晓帅,翟红华.平行语料库检索软件SDAU-ParaConc设计与实现[J].软件导刊,2019,18(9):112-115. 被引量：2
2原伟.面向计算机辅助翻译的乌兹别克语-汉语平行语料库构建与应用[J].电脑知识与技术,2019,15(10X):101-103. 被引量：2

1汉江边上.双语翻译令有道阅读胜人一筹[J].网友世界,2009(21):31-31.
2李芳,盛焕烨.双语词汇自动获取系统[J].上海交通大学学报,2001,35(9):1386-1389.
3章成志,苏新宁.面向信息检索的词汇知识发现[J].现代图书情报技术,2007(1):10-14. 被引量：3
4郭稷,吕雅娟,刘群.一种有效的基于Web的双语翻译对获取方法[J].中文信息学报,2008,22(6):103-109. 被引量：11
5林哲辉,贾剑锋,郭文.新闻领域双语语料建设与句子对齐方法的研究[J].电脑与信息技术,2008,16(1):5-7.
6格桑.初中藏文信息处理中自动分词方法研究[J].杂文月刊（教育世界）,2016,0(8):174-175.
7段建勇,闫启伟,张梅,胡熠.维基百科中翻译对的模板挖掘方法研究[J].中文信息学报,2015,29(2):190-198. 被引量：2
8普布旦增欧珠.藏文分词中交集型歧义字段的切分方法研究[J].西藏科技,2012(3):75-76.
9普布旦增,欧珠.藏文分词中交集型歧义字段的切分方法研究[J].西藏大学学报（社会科学版）,2010,25(S1):196-197. 被引量：2
10林彬.关注语言知识提高阅读理解能力[J].中学生英语（教师版）,2011(10):60-62.

哈尔滨工程大学学报

2006年第B07期

浏览历史

内容加载中请稍等...

利用语义相似度解决双语词汇知识获取的错误累计问题被引量：1

参考文献17

二级参考文献13

共引文献8

同被引文献48

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

利用语义相似度解决双语词汇知识获取的错误累计问题 被引量：1

参考文献17

二级参考文献13

共引文献8

同被引文献48

引证文献1

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

利用语义相似度解决双语词汇知识获取的错误累计问题被引量：1