期刊文献+

基于搜索引擎的词汇语义相似度计算方法 被引量:21

Measuring Semantic Similarity between Words Using Web Search Engines
下载PDF
导出
摘要 词汇语义相似度的计算在网页浏览和查询推荐等网络相关工作中起着重要的作用。传统的基于分类的方法不能处理持续出现的新词。由于网络数据中隐藏着大量的噪音和冗余,鲁棒性和准确性仍然是一个挑战,因此提出了一种基于搜索引擎的词汇语义相似度计算方法。语义片段和检索结果的页数被用来去除词汇语义相似度计算过程中的噪音和冗余。此外,还提出了一种方法来整合查询结果页数、语义片段和显示的搜索结果的数量,该方法不需要任何先验知识与本体。实验结果显示,所提出的方法在Rubenstein-Goodenough测试集的相关系数为0.851,优于现有的基于网络的词汇语义相似度计算方法,同时在搜索引擎的查询扩展任务中具有较为良好的应用效果。 Semantic similarity measures play important roles in many Web-related tasks such as Web browsing and query suggestion.Because taxonomy-based methods cannot deal with continually emerging words,recently Web-based methods have been proposed to solve this problem.Because of the noise and redundancy hidden in the Web data,robustness and accuracy are still challenges.We proposed a method integrating page counts and snippets returned by Web search engines.Then,the semantic snippets and the number of search results were used to remove noise and redundancy in the Web snippets.After that,a method integrating page counts,semantics snippets and the number of already displayed search results was proposed.The proposed method does not need any human annotated knowledge,and can be applied Web-related tasks easily.A correlation coefficient of 0.851 against Rubenstein-Goodenough benchmark dataset shows that the proposed method outperforms the existing Web-based methods by a wide margin.Moreover,the proposed semantic similarity measure significantly improves the quality of query suggestion against some page counts based methods.
作者 陈海燕
出处 《计算机科学》 CSCD 北大核心 2015年第1期261-267,共7页 Computer Science
基金 国家社会科学基金项目(06BFX051) 上海高校选拔培养优秀青年教师科研专项基金(hzf05046)资助
关键词 语义相似度 信息检索 查询建议 网络检索 Semantic similarity Information retrieval Query suggestion Web search
  • 相关文献

参考文献29

  • 1Resnik P. Semantic similarity in a taxonomy an information based measure and its application to problems of ambiguity in natural language[J]. Journal of Artificial Intelligence Research1999,11:95-130.
  • 2Luo X, Hu Q, Xu W, et al. Discovery of textual knowledge flow based on the management of knowledge maps[J]. Concurrency and Computation: Practice and Experience, 2008,20 : 1791-1806.
  • 3Luo X, Xu Z, Li Q,et al. Generation of similarity knowledge flow for intelligent browsing based on semantic link networks [J]. Concurrency and Computation: Practice and Experience 2009,21 : 2018-2032.
  • 4Luo X,Yu J,Li Q,et al. Building web knowledge flows based on interactive computing with semantics[J]. New Generation Com- puting,2010,28:113 -120.
  • 5Zhang S, Luo X, Chen J, et al. Measuring knowledge delivery quantity of associated knowledge flow[C]//Proceedings of the Fourth International Conference on Semantics, Knowledge and Grid. IEEE Computer Society: Washington, DC, 2008 : 117-124.
  • 6Smeulders A, Worring M, Santini S, et al. Content-based image retrieval at the end of the early years[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000,22 ( 12 ) : 1349- 1380.
  • 7Srihari R, Zhang Z, Rao A. Intelligent indexing and semantic re- trieval of multimodal documents [ J]. Information Retrieval, 2000,2:245-275.
  • 8Makkonen J, Ahonen-Myka H, Salmenkivi M. Simple semantics in topic detection and tracking[J]. Information Retrieval, 2004, 7:347-368.
  • 9Green S J. Building hypertext links by computing semantic simi- larity[J]. IEEE Transactions on Knowledge and Data Enginee- ring, 1999,11 (5) : 713-730.
  • 10Vojnovic M, Cruise J, Gunawardena D, et al. Ranking and sug- gesting popular items[J]. IEEE Transactions on Knowledge and Data Engineering, 2009,21 (8) : 1133-1146.

同被引文献209

引证文献21

二级引证文献174

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部