期刊文献+

一种新的Web日志聚类算法的研究与实现 被引量:2

Research and Realization on a New Clustering Algorithm for Web Log
下载PDF
导出
摘要 传统的用于Web日志聚类的算法大都需要用户指定聚类个数。提出了一种新的自适应聚类算法并对Web日志用户会话进行聚类。该算法基于凝聚聚类思想和划分聚类思想,用初始数据集中每2个会话之间的相异度作为距离的度量,合并距离小于一定阈值的两个会话以产生初始聚类,再根据一定的规则动态地合并距离最小的会话类或会话,算法的结果是产生自然的聚类。最后,通过比较会话聚类的内部距离和类间距离来验证算法的有效性。这种聚类算法的最大优点在于,他能够产生自动的聚类,而不需要用户事先指定需要产生的聚类个数,并且能有效识别孤立点。实验表明,这种聚类能够产生较高质量的聚类效果。 In most Web log clustering methods,the number of clusters is predefined and the clusters are highly dependent on the initial identification of elements that represent the clusters well. In this paper, we advance an adaptive clustering algo- rithm and use it on clustering user - sessions from Web log. The algorithm is based on agglomeration and division,which uses degree of dissimilitude as the distance between two user - sessions, merges two clusters or one session and a cluster according to some rules dynamically and produces natural clusters finally. The algorithm proves to be effective through comparing the average inner distance of a cluster and outer distances among clusters. The advantages of algorithm are that it can cluster without regard to the initial number of clusters and can identify outliers effectively.
出处 《现代电子技术》 2007年第24期139-142,共4页 Modern Electronics Technique
关键词 相异度 凝聚聚类算法 自适应聚类算法 用户会话 degree of dissimilitude agglomerative clustering adaptive clustering user session
  • 相关文献

参考文献7

  • 1Sumit Sen,Rajesh N,Dav C. Agglomerative Model for Fuzzy Relational Clustering (FRC)[C]. Fuzzy Information Processing Society. 2000. NAFIPS. 19th International Conference of the North American,2000:267 -271.
  • 2Hichem Frigui, Raghu Krishnapuram. Competitive Fuzzy Clustering[C]. Fuzzy Information Processing Society. 1996. NAFIPS. 1996 Biennial Conference of the North American, 1996:225 - 228.
  • 3Catledge L D, Pitkow J E. Characterizing Browsing Strategies in the World - Wide Web[J]. Computer Networks and ISDN Systems,1995,27(6) :1 065 - 1 073.
  • 4Anupam Joshi, Karuna Joshi. On Mining Web Access Logs [C]. Proc. SIGMOD 2000 Workshop on Research Issues in Data Mining and Knowledge Discovery. Dallas, 2000.
  • 5David Hand,Heikki Mannila,Padhraic Smyth.数据挖掘原理[M].北京:机械工业出版2003.
  • 6Nasraoui O,Frigui H,Joshi A. Mining Web Access Logs Using Relational Competitive Fuzzy Clustering[C]. Proc 8th Int'l Fuzzy Systems Association World Congress, 1999.
  • 7http://www.cs.washington.edu/ai/adaptivedata2/? M=A.

共引文献27

同被引文献30

  • 1李颖基,彭宏,郑启伦.基于用户任务级的Web日志聚类[J].小型微型计算机系统,2004,25(9):1620-1623. 被引量:3
  • 2贺玲,吴玲达,蔡益朝.数据挖掘中的聚类算法综述[J].计算机应用研究,2007,24(1):10-13. 被引量:226
  • 3CNNIC.第32次中国互联网络发展状况统计报告[R].2013.
  • 4Zhang Yun-tao,Gong Ling,Wang Yong-cheng.An improved TF-IDF approach for text classification[J]. Journal of Zhejiang University SCIENCE A . 2005 (1)
  • 5ZHONG S,Khoshgoftaar T M,SELIYA N.Clustering-based network intrusion detection. International Journal of Reliability,Quality and Safety Engineering . 2007
  • 6Chi mphlee W,Abdullah A H,Noor M.Unsupervised anomaly detection with unlabeled data using clustering. Postgraduate Annual Research Seminar . 2005
  • 7HANJia-wei,Kamber Micheline.Data mining concepts andtechniques. . 2001
  • 8YUJ X,Yuming Ou,ZHANG C,et al.Identifying in-teresting visitors through Web log classification. Intelligent Systems,IEEE . 2005
  • 9GUAN Y,GHORBANI A A,BELACEL N.Y-means:A clustering method for intrusion detection. CCECE2003Canadian Conference on Electri-cal and Computer Engineering . 2003
  • 10AnupamJoshi,,Karuna Joshi.On mining Web access Logs. Proc.SIGMOD2000Workshop on Research Issues in Data Mining and Knowledge Discovery . 2000

引证文献2

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部