期刊文献+

基于深层特征抽取的日文词义消歧系统 被引量:1

Japanese word sense disambiguation system based on deep feature extraction
原文传递
导出
摘要 词义消歧的特征来源于上下文.日文兼有中英文的语言特性,特征抽取更为复杂.针对日文特点,在词义消歧逻辑模型基础上,利用最大熵模型优良的信息融合性能,采用深层特征抽取方法,引入语义、句法类特征用于消解歧义.同时,为避免偏斜指派,采用BeamSearch算法进行词义序列标注.实验结果表明,与仅使用表层词法类特征方法相比,本文构造的日文词义消歧系统的消歧精度提高2%~3%,动词消歧精度获得5%的改善. The features of word sense disambiguation (WSD) come from the context. Japanese has linguistic features of both Chinese and English at the same time, thus the feature extraction of Japanese is more complicated. Considering Japanese features, based on the proposed WSD logic model and applying the characteristics of information integration of the maximum entropy model, WSD was solved by the deep feature extraction method, introducing semantics and syntactics features. Meanwhile, for preventing the skewed assignment of lonely word sense, the word sense tagging of word sequences was completed with the BeamSearch algorithm. Experiment results show that compared with WSD methods which only focus on the surface lexical features, the disambiguation accuracy of the Japanese WSD system proposed in this paper increases 2% to 3% , and the WSD accuracy of verbs improves 5%.
出处 《北京科技大学学报》 EI CAS CSCD 北大核心 2010年第2期263-269,共7页 Journal of University of Science and Technology Beijing
基金 国家高技术研究发展计划资助项目(No.2007AA01Z170)
关键词 自然语言处理 词义消歧 最大熵模型 特征抽取 natural language processing word sense disambiguation maximum entropy model feature extraction
  • 相关文献

参考文献12

  • 1Manning C D, Schutze H. Foundations of Statistical Natural Language Processing. Cambridge: MIT Press, 1999:143.
  • 2卢志茂,刘挺,李生.统计词义消歧的研究进展[J].电子学报,2006,34(2):333-343. 被引量:28
  • 3朱靖波,李珩,张跃,姚天顺.基于对数模型的词义自动消歧[J].软件学报,2001,12(9):1405-1412. 被引量:13
  • 4Murata M, Utiyama M, Uchimoto K, et al. Japanese word sense disambiguation using the simple Bayes and support vector machine methods // Proceedings of the SENSEVAL-2 , Toulouse, 2001 : 135.
  • 5Ratnaparkhi A. Maximum Entropy Models for Natural Language Ambiguity Resolution [ Dissertation ]. Philadelphia: University of Pennsylvania, 1998.
  • 6Zhang L. Maximum Entropy Modeling Toolkit for Python and C + +. [2006-10-05 ]. http: //homepages. inf. ed. ac. uk/ s0450736! maxent _ toolkit, html.
  • 7王大亮,张德政,涂序彦,郑雪峰,佟子健.基于相对条件熵的搭配抽取方法[J].北京邮电大学学报,2007,30(6):40-45. 被引量:3
  • 8Carl P, Ivan A S. Head Driven Phrase Structure Grammar. Chicago: University of Chicago Press, 1994.
  • 9Christoph T. Word reordering and a dynamic programming beam search algorithm for statistical machine translation. Comput Linguist, 2003, 29(1):97.
  • 10Tanaka T, Francis B, Fujita S. The Hinoki sensebank--a largescale word sense tagged corpus of Japanese// Proceedings of the Workshop on Frontiers in Linguistically Annotated Gorpora. Sydney, 2006 : 62.

二级参考文献70

共引文献41

同被引文献5

引证文献1

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部