期刊文献+

基于维基百科社区挖掘的词语语义相似度计算 被引量:9

Semantic Similarity Computing Based on Community Mining of Wikipedia
下载PDF
导出
摘要 词语语义相似度计算在自然语言处理如词义消歧、语义信息检索、文本自动分类中有着广泛的应用。不同于传统的方法,提出的是一种基于维基百科社区挖掘的词语语义相似度计算方法。本方法不考虑单词页面文本内容,而是利用维基百科庞大的带有类别标签的单词页面网信息,将基于主题的社区发现算法HITS应用到该页面网,获取单词页面的社区。在获取社区的基础上,从3个方面来考虑两个单词间的语义相似度:(1)单词页面语义关系;(2)单词页面社区语义关系;(3)单词页面社区所属类别的语义关系。最后,在标准数据集WordSimilarity-353上的实验结果显示,该算法具有可行性且略优于目前的一些经典算法;在最好的情况下,其Spearman相关系数达到0.58。 Words semantic similarity computing has been widely used in natural language processing, such as word sense disambiguation, information retrieval, text auto categorization. Different from traditional methods, we presented an algo- rithm based on community mining of Wikipedia to compute words semantic similarity. Our method makes use of the huge Wikipedia page network with category labels rather than its textual content. To get the community of a word page,we applied the HITS,which is a community discovery algorithm based on the theme, to pages network. Based on the gotten community,we measured the semantic similarity between two words from three aspects: (1)semantic rela- tions between the two word pages, (2)semantic relations between the two communities of word page, (3)semantic rela- tions between the categories which two communities belong to. Finally, tests on standard data sets WordSimilarity-353 show that the method we proposed is feasible and slightly better than some classic algorithms. In the best case, the Spearman correlation coefficient reaches 0. 58.
出处 《计算机科学》 CSCD 北大核心 2016年第4期45-49,共5页 Computer Science
基金 福建省科技计划重点项目(2011H0028)资助
关键词 语义相似度 社区发现 维基百科 Semantic similarity, Community discovery, Wikipedia
  • 相关文献

参考文献4

二级参考文献48

  • 1Leacock C,Chodorow M.Combining Local Context and WordNet Similarity for Word Sense Identification[EB/OL].(1998-05-18).http://www.bibsonomy.org/bibtex/2087c974c471792ddlfa536aa6a 75eobc/asalber.
  • 2Resnik P Using Information Content to Evaluate Semantic Similarity in a Taxonomy[C]//Proc.of the 14th International Joint Conference on Artificial Intelligence.[S.l.]:Springer,1995:448-453.
  • 3Struve M,Ponzetto S P.WikiRelate!Computing Semantic Relatedness Using Wikipedia[C]//Proc.of Association for the Advancement of Artificial Intelligence.Boston,USA:IEEE Press,2006:1419-1424.
  • 4Jurafsky D.自然语言处理综论[M].冯志伟,孙乐,译.北京:电子工业出版社,2005.
  • 5Buchanan B G, Feigenbaum E A. Forward//Davis R, Lenat D B.Knowledge-Based Systems in Artificial Intelligence. New York: McGraw-Hill, 1982:39-51.
  • 6Lenat D, Guha R. Building Large Knowledge Based Systems. New York: Addison Wesley, 1990.
  • 7Ricardb B Y, Berthier R N. Modern Information Retrieval. New York: Addison Wesley, 1999.
  • 8Deerwester S, Dumais S, Furnas G, Landauer T, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science, 1990, 41(6): 391-407.
  • 9Alexander B, Graeme H. Evaluating wordnevbased measures of lexical semantic relatedness. Computational Linguistics, 2006, 32(1): 13-47.
  • 10Mario J. Roget's thesaurus as a lexlcal resource for natural language processing [Ph. D. dissertation]. University of Ottawa, Ottawa, 2003.

共引文献41

同被引文献86

引证文献9

二级引证文献144

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部