期刊文献+

利用语义相似度解决双语词汇知识获取的错误累计问题 被引量:1

Resolving error accumulation of automatically acquiring bilingual lexical knowledge by semantic similarity
下载PDF
导出
摘要 在利用大规模英汉双语平行语料库进行双向双语翻译词典建设时发现:由于错误累计问题.现有词对齐技术无法直接获取质量较高的双语词汇知识.由此提出一种基于HowNet以及WordNet进行相似度计算,然后设定相似度阈值来进行词义过滤的方法.实验结果表明该方法行之有效.并对HowNet以及WordNet相似度计算方法进行了基于实际应用的对比与探讨后得出:HowNet在语义区分上粒度更细因此其召回率较高,WordNet则具有更高的精确率. While using a large-scale bilingual English-Chinese corpus to build translation dictionary, after some statistics and analysis, it is found that there are some unconquerable error accumulation problems while acquiring bilingual lexical knowledge by using large-scale bilingual corpus. Furthermore, a method is raised to solve this problem using semantic dictionary and its similarity measurement, Primary experiment has indicated that this method is effective and feasible. The application-oriented comparison between HowNet and WordNet has been made in this paper, and a conclusion is drawn: HowNet has higher recall while WordNet has higher precision for their difference of semantic granularity.
出处 《哈尔滨工程大学学报》 EI CAS CSCD 北大核心 2006年第B07期575-579,共5页 Journal of Harbin Engineering University
基金 国家自然科学基金资助项目(60375019).
关键词 词对齐 知识获取 HOWNET WORDNET 相似度 语义词典 错误累计 word alignment knowledge acquire HowNet WordNet similarity semantic dictionary error accumulation
  • 相关文献

参考文献17

  • 1GALE W A,CHURCH K W.Identifying word correspondences in parallel texts[A].In:Proceedings of the 4th DARPA workshop on Speech and Natual Language[C].[s.l.],1991.
  • 2吕亚娟,双语语料库对齐及翻译知识自动获取技术研究[D].哈尔滨:哈尔滨工业大学,2003.
  • 3DAGAN I,CHURCH K W,GALE W A.Robust bilingual word alignment for machine aided translation[A].Proc.of Workshop on Very Large Corpora[C].[s.l.],1993.
  • 4BROWN P F,COCKE J,PIETRA S A,et al.A statistical approach to machine translation[J].Computational Linguistics,1990,16(2):79-85.
  • 5KUMANO A,HIRAKAWA H.Building an MT dictionary from parallel texts based on linguistic and statistical information[A].In:Proceedings of the 15th International Conference on Computational Linguistics[C].Kyoto,Japan,1994.76-81.
  • 6CHEN A,KISHIDA K.Automatic construction of a Japanese english lexicon and its application in cross-language information retrieval[A].In Joint ACM DLPACM SIGIR Workshop on Multilingual Information Discovery and Access[C].[s.l.],1999.
  • 7杨沐昀,刘晓月,李生.基于汉英双语语料库的汉英词典编撰研究[J].情报学报,2003,22(3):310-314. 被引量:7
  • 8GEORGE M A,RECHARD T B,DEREK G,et al.WordNet:an on-line lexical database[J].International Journal of lexicography,1990,3(4):235-244.
  • 9杜飞龙.知网辟蹊径 共享新天地——董振东先生谈知网与知识共享[J].微电脑世界,1999,0(29):11-13. 被引量:3
  • 10RESNIK P.Using information content to evaluate semantic similarity[A] In:Proceedings of the 14th International Joint 1995 Conference on Artificial Intelligence[C].Montreal,1995.

二级参考文献13

  • 1J Nie, M Simard, et al. Cross-language information retrieval based on parallel texts and automatic mining parallel texts from the Web. ACM-SIGIR Conference, Berkeley, California,1999.
  • 2D Lonsdale, E Mitamura, E Nyberg. Acquisition of large lexicons for practical knowledge-based MT. Machine Translation,1995, 9(3) : 101 - 133.
  • 3M Barlow. Parallel texts in language reaching. In: A M McEnery, et al. ed. Corpora and Language Reasearch: A Selection of Papers from Talc96. Lancaster University. 1996.
  • 4W A Gale, K W Church. Identifying word correspondences in parallel texts. Proceedings of the 4th DARPA Workshop on Speech and Natural Language. 1991: 152- 157.
  • 5P F Brown, J Cocke and S A Pietra, et al. A statistical approach to machine translation. Computational Linguistics,1990, 16(2) :79 - 85.
  • 6I Dagan, K W Church and W A Gale. Robust bilingual word alignment for machine aided translation. Proc. of Workshop on Very Large Corpora. 1993 : 1 - 8.
  • 7A Chen, K Kishida, et al. Automatic construction of a japanese-english lexicon and its application in cross-lanague information retrieval. In Joint ACM DIdACM SIGIR Workshop on Muhilingual Information Discovery and Access (MIDAS).
  • 8R C Moore. Towards a simple and accurate statistical approach to learning translation relationships among words. Proceedings of Workshop on Data-driven Machine Translation of 39th ACL and 10th ACL European Chapter. 2001:79 - 86.
  • 9K W Church, P Hanks. Word association norms, mutual information and lexicography. Computational Linguistics, 1991, 16(1).
  • 10T Dunning. Accurate methods for the statistics of surprise and coincidence. Computational Linguistics, 1993, 19:61-74.

共引文献8

同被引文献48

引证文献1

二级引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部