期刊文献+

基于Web日志挖掘的Web文档聚类 被引量:5

Web document clustering based on web-log mining
下载PDF
导出
摘要 Web日志挖掘是Web挖掘的一种,介绍了Web日志挖掘的一般过程,研究了k-means聚类算法,并分析了k-means聚类算法的不足。k-means聚类算法迭代过程中每次都需要计算每个数据对象到簇质心的距离,使得聚类效率不高,针对这个问题,提出了k-means聚类算法的改进算法,该算法避免了重复计算数据对象到簇质心的距离,并用这两种算法实现了Web文档的聚类。试验结果表明,该改进算法提高了聚类效率。 Web log mining is one of the web mining. The process of the web log mining and the k-means algorithms are introduced. And the shortage of the k-means algorithm is analyzed. The k-means algorithm needs to compute the distance between every data object and the center of the clusters, which lowers the efficiency. To this problem, an enhanced algorithm of the k-means is put forward, which avoids computing the distance between every data object and the center of the clusters. Web document clustering is implemented with two algorithms and it is shown that the enhanced algorithm improves the clustering efficiency.
出处 《计算机工程与设计》 CSCD 北大核心 2008年第18期4708-4710,共3页 Computer Engineering and Design
关键词 日志挖掘 WEB日志 K-MEANS 文档聚类 日志预处理 web log mining web log k-means web document clustering data preprocessing
  • 相关文献

参考文献7

  • 1Robert Cooley, Bamshad Mobasher, Jaideep Srivastava. Data preparation for mining world web browsing pattems[J].Knowledge and information Systems, 1999.
  • 2Baglioni M,Ferrara U,Romei A,et al.Preproeessing and mining web log data for web personalization [EB/OL]. http://www. di.unipi.it/-ruggieri/Papers/aiia2003.pdf,2003.
  • 3Liu Haibin,Vlado Kes.Combined mining of web server logs and web contents for classifying user navigation patterns and predicting users' future requests[J].Data and Knowledge Engineering, 2006(7):307-309.
  • 4郭崇慧,田凤占.数据挖掘教程[M].北京:清华大学出版社,2006:179-180.
  • 5FAHIM A.M,SALEM A.M,TORKEY F.A,RAMADAN M.A.An efficient enhanced k-means clustering algorithm[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(10):1626-1633. 被引量:30
  • 6陈正鸣.基于遗传算法的k-means聚类方法的研究[D].南京:河南大学,2007:56-57.
  • 7苏中,马少平,杨强,张宏江.基于Web-Log Mining的Web文档聚类[J].软件学报,2002,13(1):99-104. 被引量:29

二级参考文献6

  • 1Ng, R., Han, J. Efficient and effective clustering methods for data mining. In: Bocca, J.B., Jarke, M., Zaniolo, C., eds. Proceedings of the 1994 International Conference on Very Large Data Bases (VLDB'94). Santiago, Chile: Morgan Kaufmann, 1994. 144~155.
  • 2Ester, M., Kriegal, H.P, Sander, J. A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, Evangelos, Han, Jia-wei, Fayyad, U.M., eds. KDD'96--Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
  • 3Kaufman, L., Rousseeuw, P. J. Finding Groups in Data: an Introduction to Cluster Analysis. John Wiley & Sons, 1990.
  • 4Sibson, R. SLINK: an optimally efficient algorithm for the single-link cluster method. The Computer Journal, 1973,16(1):20~34.
  • 5Bouguettaya, A. On-Line clustering. IEEE Transactions on Knowledge and Data Engineering. 1996,8(2):333~339.
  • 6Voorhees, E.M. Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management, 1986,22:465~476.

共引文献61

同被引文献82

引证文献5

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部