期刊文献+

基于证据理论的单词语义相似度度量 被引量:4

Word Semantic Similarity Measurement Based on Evidence Theory
下载PDF
导出
摘要 单词语义相似度度量一直是自然语言处理领域的经典和热点问题,其成果可对词义消歧、机器翻译、本体映射、计算语言学等应用具有重要影响.本文通过结合证据理论和知识库,提出一个新颖的度量单词语义相似度度量途径.首先,借助通用本体Word Net获取证据;其次,利用散点图分析证据的合理性;然后,使用统计和分段线性插值生成基本信任分配函数;最后,结合证据冲突处理、重要度分配和D-S合成规则实现信息融合获得全局基本信任分配函数,并在此基础上量化单词语义相似度.在数据集R&G(65)上,对比本文算法评判结果与人类评判结果的相关度,采用5折交叉验证对算法进行分析,相关度达到0.912,比当前最优方法 P&S高出0.4个百分点,比经典算法re LHS、dist JC、sim LC、sim L和sim R高出7%~13%;在数据集M&C(30)和Word Sim353上也取得了比较好的实验结果,相关度分别为0.915和0.941;且算法的运行效率和经典算法相当.实验结果显示使用证据理论解决单词语义相似度问题是合理有效的. Measuring semantic similarity between words is a classical and hot problem in nature language processing, the achievement of which has great impact on many applications such as word sense disambiguation, machine translation, ontology mapping, computational linguistics, etc. This paper proposes a novel approach to measure words semantic similarity by combining evidence theory with knowledge base. Firstly, we extract evidences based on WordNet;secondly, we analyze the reasonableness of the extracted evidence using scatter plot;thirdly, we generate basic probability assignment by statistics and piecewise linear interpolation technique; fourthly, we obtain global basic probability assignment by integrating evidence conflict resolution, importance distribution, and D-S combination rules; finally, we quantify word semantic similarity. On data set R&G(65), we conducted experiment through 5-fold cross validation, and the correlation of our experimental results with human judgment was 0.912, with 0.4% improvements over existing best practice P&S, 7%~13% improvements over classical methods (reLHS、distJC、simLC、simL, simR); the experimental results based on M&C(30) and WordSim353 were also good with correlations being 0.915 and 0.941. The operational e?ciency of our method is as good as classical methods0, showing that using evidence theory to measure word semantic similarity is reasonable and effective.
出处 《自动化学报》 EI CSCD 北大核心 2015年第6期1173-1186,共14页 Acta Automatica Sinica
基金 国家自然科学基金(60903098,60973040,61300148,61472049) 吉林省重点科技攻关项目(20130206051GX) 吉林省科技计划青年基金项目(20130522112JH)资助~~
关键词 词计算 统计学习 证据理论 不确定性度量 Computing with word statistical learning evidence theory uncertainty modeling
  • 相关文献

参考文献36

  • 1Zhou M, Ding Y, Huang C N. Improving translation selec- tion with a new translation model trained by independent monolingual corpora. Computational Linguistics and Chi- nese Language Processing, 2001, 6(1): 1-26.
  • 2Leacock C, Chodorow M. Combining LocM Context and WordNet Similarity t'or Word Sense Identification. Cam- bridge: MIT Press, 1998. 265-283.
  • 3鹿文鹏,黄河燕,吴昊.基于领域知识的图模型词义消歧方法[J].自动化学报,2014,40(12):2836-2850. 被引量:10
  • 4刘宇鹏,李生,赵铁军.基于WordNet词义消歧的系统融合[J].自动化学报,2010,36(11):1575-1580. 被引量:12
  • 5Hassan H, Hassan A, Emam O. Unsupervised information extraction approach using graph mutual reinforcement. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing. Stroudsburg, PA, USA: As- sociation for Computational Linguistics, 2006. 501-508.
  • 6李文清,孙新,张常有,冯烨.一种本体概念的语义相似度计算方法[J].自动化学报,2012,38(2):229-235. 被引量:44
  • 7Cui Q, Gao B, Bian J, Qiu S, Liu T Y. KNET: A General Framework for Learning Word Embedding Using Morpho- logical Knowledge. arXiv: 1407.1687, 2014. 1-16.
  • 8Rada R, Mili H, Bicknell E, Blettner M. Development and application of a metric on semantic nets. IEEE Transactions on Systems, Man, and Cybernetics, 1989, 19(1): 17-30.
  • 9Resnik P. Using information content to evaluate semantic similarity in a taxonomy. In: Proceedings of the 14th In- ternational Joint Conference on Artificial Intelligence. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1995. 448-453.
  • 10Wu Z B, Palmer M. Verbs semantics and lexical selection. In: Proceedings of the 32nd Annual Meeting on Associa- tion for Computational Linguistics. Stroudsburg, PA, USA: Association for Computational Linguistics, 1994. 133-138.

二级参考文献64

共引文献67

同被引文献38

引证文献4

二级引证文献41

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部