期刊文献+

网页搜索结果聚类与可视化 被引量:5

Clustering and visualization of web search results
下载PDF
导出
摘要 搜索引擎成为当今在互联网上进行信息检索最常用的工具.主流搜索引擎以与用户查询的相关度排序返回搜索结果,且自然语言中存在的"一义多词"和"一词多义"现象,用户很难清楚表达他们的意图,导致往往花费较长时间从结果列表中选择所感兴趣的话题.针对这种状况,采用网页聚类技术对标题和摘要进行聚类后,并可视化地以树和图的方式向用户快速、全貌和直观地展示搜索结果,明显改善了用户搜索体验.在此基础上设计了网页聚类原型系统ECE(effective clustering engine),实验结果表明该算法具有聚类结果可读性好以及聚类准确度比较高的优点. Nowadays search engines are the most common tools for information retrieval on the internet.However,there are several limitations such as low search coverage and dynamic characteristic of web pages,it is the reason why no breakthrough made on users' searching experience recent years.The leading search engines will return a long list of records that are sorted by the correlation with the queries,the phenomena of synonymy and polysemy make users express their intention difficultly and spend much time on selecting web pages they are interested in.This paper aims at enhancing searching experience using data analysis technologies.Through clustering and visualizing web search results,then grouping the clustering results according to some criterions,it makes users locate their interested information quickly.The data structure related to suffix tree are being widely used in string processing and text compression.The clustering algorithm based on suffix tree which makes it easy to recognize the shared phrases among web pages can be used to cluster web pages,it improves the clustering efficiency as not to calculate the similarities between pair-wise documents,and assigns meaningful labels for the clustering results to enhance the readability,also improves end users' searching experience through visualization.An effective clustering engine prototype system named effective clustering engine has been built on this approach.The algorithm is quite efficient,and the clustering results are readable and accurate verified by the experiments.
出处 《南京大学学报(自然科学版)》 CAS CSCD 北大核心 2010年第5期542-551,共10页 Journal of Nanjing University(Natural Science)
基金 国家自然科学基金(60475019 60970061) 博士学科点专项基金(20060247039)
关键词 网页聚类 后缀树 可视化 短语簇 算法 web clustering suffix tree visualization phrase cluster algorithm
  • 相关文献

参考文献25

  • 1Zeng H J, He Q C, Chen Z, etal. Learning to cluster web search Results. Proceedings of the 27^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Sheffield, 2004.- 210-217.
  • 2Zhang D, Dong Y S. Semantic, hierarchical, online clustering of web search results. Proceedings of the Advanced Web Technologies and Applications, the 6^th Asia-Pacific Web Conference, 2004, 3007: 69-78.
  • 3Cutting D, Karger D, Pedersen J, et al. Scatter/Gather: A cluster-based approach to browsing large document collections. Proceedings of the 15^th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Copenhagen, 1992, 318-392.
  • 4Zamir O, Etzioni O. Grouper: A dynamic clustering interface to web search results. Computer Networks, 1999, 31(11-16) : 1361-1374.
  • 5Weiss D, Osinski S. Carrot^2 open source framework for building search clustering engines. http://project.carrot2. org/. 2008-03.
  • 6Osinski S, Stefanowski J, Weiss D. Lingo: Search results clustering algorithm based on singular value decomposition. Proceedings of the International Conference on Intelligent Information Systems (IIPWM), 2004, 359-368.
  • 7Giacomo E, Didimo D, Grilli L, et al. Graph visualization techniques for web clustering engines. IEEE Transactions on Visualization and Computer Graphics, 2007, 13(2): 294-304.
  • 8Gulli A. Personalized sankeT, http://snaket. di. unipi. it/. 2005-06.
  • 9Vivisimo Company. Vivisimo information optimized, http://vivisimo. com/. 2008-05.
  • 10郑苗苗,吉根林.一种基于密度的分布式聚类算法[J].南京大学学报(自然科学版),2008,44(5):536-543. 被引量:10

二级参考文献50

  • 1陈浩,何婷婷,姬东鸿.基于k-means聚类的无导词义消歧[J].中文信息学报,2005,19(4):10-16. 被引量:16
  • 2李洁,高新波,焦李成.基于特征加权的模糊聚类新算法[J].电子学报,2006,34(1):89-92. 被引量:113
  • 3赵鹏,耿焕同,王清毅,蔡庆生.基于聚类和分类的个性化文章自动推荐系统的研究[J].南京大学学报(自然科学版),2006,42(5):512-518. 被引量:13
  • 4Regina Barzilay,Min-Yen Kan,and Kathleen R.McKeown.Simfinder:A Flexible Clustering Tool for Summarization[A].In proceedings of the Workshop on Summarization in NAACL 01[C].Pittsburg,Pennsylvania,USA:June 2001.
  • 5Zheng Chen,Wei-Ying Ma,Jinwen Ma.Learning to Cluster Web Search Results[A].In:proceedings of the 27th Annual International ACM SIGIR Conference[C].Sheffield,South Yorkshire,UK,July 2004,210 -217.
  • 6Y.C.Fang,S.Parthasarathy,F.Schwartz.Using Clustering to Boost Text Classification[J].In:proceedings of the IEEE ICDM Workshop on Text Mining,Maebashi City,Japan,2002.
  • 7A.Rauber,and M.Frühwirth.Automatically Analyzing and Organizing Music Archives[A].In:proceedings of the 5.European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2001)[C].Darmstadt,Germany,2001.
  • 8Cutting,D.,Karger,D.,and etc.Scatter/Gather:A Cluster-based Approach to Browsing Large Document Collections[A].SIGIR ‘ 92,1992[C].318-329.
  • 9JR Wen,JY Nie,HJ Zhang.Clustering User Queries of a Search Engine[A].The Tenth International World Wide Web Conference[C].Hong Kong.May 1 -5,2001.
  • 10Anton Leuski and James Allan.Improving Interactive Retrieval by Combining Ranked Lists and Clustering[A].In:proceedings of RIAO2000[C].Paris,France,April 12-14,2000,665 -681.

共引文献1133

同被引文献61

引证文献5

二级引证文献12

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部