基于K-center和信息增益的Web搜索结果聚类方法被引量：1

Web search result clustering based on K-center and information gain

下载PDF

导出

摘要基于K-center和信息增益的概念,将改进后的FPF(furthest-point-first)算法用于Web搜索结果聚类,提出了聚类标志方法,使得聚类呈现出的结果更易于用户理解,给出了评价聚类质量的模型。将该算法与Lingo,K-means算法进行比较,其结果表明,本算法能够较好地平衡聚类质量和速度,更加适用于Web检索聚类。 Based on K-center and information gain, this paper represented a version of modified FPF algorithm and cluster labeling algorithm on Web search clustering, made the result better understood. At last, presented a simple and intuitionistic criterion NMI for estimating cluster quality. The proposed solution was evaluated in search results returned from actual Web search engine and compared with other methods, like Lingo, K-means. The result proves that the algorithm can balance better clustering time and quality, and meets the requirements of Web searching clustering.

作者丁振国孟星

机构地区西安电子科技大学计算机学院

出处《计算机应用研究》 CSCD 北大核心 2008年第10期3125-3127,共3页 Application Research of Computers

基金国家"863"计划资助项目(2004AA1Z2520) 军队网络互联与信息安全策略研究资助项目(2006QB1069)

关键词 WEB文档聚类聚类标志 K-center 信息增益 Web document clustering cluster labeling K-center information gain

分类号 TP183 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1CNNIC.第19次中国互联网发展状况统计报告[R].2007.
2ZAMIR O, ETZIONI O. Web document clustering:a feasibility demon- stration[C]//Proc of the 19th International ACM SIGIR Conference on Research and Development of Information Retrieval. 1998:46-54.
3OSINSKI S. An algorithm for elustering of Web search result[D]. Poland : Poznan University of Technology, 2003.
4OSINSKI S, WEISS D. Conceptual clustering using Lingo algorithm: evaluation on open directory project data[C]//Proc of the 5th Conference on Intelligent Information Processing and Web Mining. 2004: 369-377.
5GONZALEZ T F. Clustering to minimize the maximum inter cluster distance [ J ]. Theoretical Computer Science, 1985,38 ( 2/3 ) : 293- 306.
6COVER T M, THOMAS J A. Elements of information theory[ M ]. New York: Wiley, 1991.
7GERACI F. A scalable algorithm for high quality clustering of Web snippets[ C]//Proc of the 21st ACM Symposium on Applied Computing. New York: ACM Press, 2006 : 1058-1062.
8FEDER T, GREENE D. Optimal algorithms for approximate clustering [ C ]//Proc of the 20th ACM Symposium on Theory of Computing. New York: ACM Press, 1988:434-444.
9GONZALEZ T F. Clustering to minimize the maximum inter cluster distance[ j ]. Theoretical Computer Science, 1985,38 ( 2/3 ) : 293- 306.
10ODP[EB/OL]. http://www. dmoz. org/.

同被引文献10

1肖欣延,张东站,高君杰,薛永生.一种新的Web检索结果聚类方法[J].计算机研究与发展,2007,44(z2):79-83. 被引量：3
2黄健斌,姬红兵.基于模糊概念格的Web搜索结果聚类算法[J].西安电子科技大学学报,2005,32(6):856-860. 被引量：6
3张辉,谢科,庞斌,吴辉.一种基于关键特征的搜索引擎结果聚类算法[J].北京航空航天大学学报,2007,33(6):739-742. 被引量：4
4张刚,刘悦,郭嘉丰,程学旗.一种层次化的检索结果聚类方法[J].计算机研究与发展,2008,45(3):542-547. 被引量：15
5李红梅,丁振国,周水生,周利华.基于概念分组的Web搜索结果聚类算法[J].华南理工大学学报（自然科学版）,2009,37(1):130-134. 被引量：2
6骆雄武,万小军,杨建武,吴於茜.基于后缀树的Web检索结果聚类标签生成方法[J].中文信息学报,2009,23(2):83-88. 被引量：9
7张云,冯博琴.利用标签的层次化搜索结果聚类方法[J].西安交通大学学报,2009,43(4):18-21. 被引量：5
8陈永超,刘贵全.一种基于命名实体的搜索结果聚类算法[J].计算机工程,2009,35(7):46-48. 被引量：6
9陈毅恒,秦兵,宋凡,刘挺,李生.基于ontology抽取优化初始选择的检索结果聚类[J].电子学报,2008,36(B12):166-170. 被引量：8
10张健沛,刘洋,杨静,代坤.搜索引擎结果聚类算法研究[J].计算机工程,2004,30(5):95-97. 被引量：11

引证文献1

1罗宏,陈黎,王亚强,朱洪波,韩国辉,于中华.基于查询相关性分析的检索结果聚类算法[J].小型微型计算机系统,2011,32(10):2021-2026.

1童亚拉.基于自适应混沌粒子群的Web搜索结果聚类研究[J].微电子学与计算机,2010,27(1):173-176. 被引量：1
2许方,张桂珠.基于SFLA和FCM的Web搜索结果聚类[J].计算机工程与应用,2013,49(14):109-112. 被引量：1
3王贤明,谷琼,胡智文.基于R-Grams的文本聚类方法[J].计算机应用,2015,35(11):3130-3134. 被引量：1
4彭松行.基于描述优先算法的Web搜索结果聚类系统研究[J].心智与计算,2010,0(4):250-257. 被引量：1
5李优.Web搜索结果组织与展示的一种机制[J].信息安全与技术,2011,2(6):51-53.
6易高翔,胡和平.一种基于容错粗糙集的Web搜索结果聚类方法[J].计算机研究与发展,2006,43(2):275-280. 被引量：5
7吴江宁,王治江.一种基于后缀树的Web搜索结果聚类方法[J].情报学报,2010,29(1):78-83. 被引量：5
8何拥军,骆嘉伟,孙星明.应用链接分析的web搜索结果聚类[J].计算机工程与应用,2005,41(2):179-183. 被引量：4
9韩建福,卢苇.文档聚类在Web搜索结果中的应用研究[J].中国科技信息,2006(23):99-101. 被引量：1
10王卫玲,初建崇.一种基于二分网格的web搜索结果聚类方法[J].中国科技信息,2008(22):134-135.

计算机应用研究

2008年第10期

浏览历史

内容加载中请稍等...

基于K-center和信息增益的Web搜索结果聚类方法被引量：1

参考文献10

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于K-center和信息增益的Web搜索结果聚类方法 被引量：1

参考文献10

同被引文献10

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于K-center和信息增益的Web搜索结果聚类方法被引量：1