期刊文献+

基于闭合有间隔频繁子序列的点击流聚类 被引量:5

Clickstream Clustering Based on Closed Frequent Gapped Subsequence
下载PDF
导出
摘要 对网站日志文件中记录的点击流序列聚类可以发现用户使用模式,从而对用户归类。而传统聚类方法面临着难以提取点击流中有代表性的特征向量以及点击流及其特征向量存在数据稀疏性的问题。针对上述情况,提出一种基于闭合有间隔频繁子序列模式挖掘的点击流聚类方法。该方法从点击流中提取子序列模式的频繁支持度,构建特征向量,利用基于双向映射欧氏距离的模糊距离度量判断向量间相似度,增强BIRCH聚类算法对点击流数据的聚类效果。 Clustering of clickstreams in Web-logs can find Web visitors' using patterns,and categorize these visitors.However,traditional clustering method faces challenge of extracting representative feature vector,sparse clickstreams and feature vector.To solve the problems,a closed repetitive gapped subsequence mining based clickstream clustering method is proposed.Extract repetitive support of subsequence from clickstream,and construct feature vector.A bidirectional projected Euclidean distance based on fuzzy dissimilarity is proposed and used as distance measure of feature vectors.Clustering quality of BIRCH algorithm on clickstream is enhanced.
作者 马超 沈微
出处 《计算机工程》 CAS CSCD 北大核心 2010年第23期72-75,共4页 Computer Engineering
关键词 点击流 聚类 频繁子序列模式 网络使用挖掘 clickstream clustering frequent subsequence pattern Web-usage mining
  • 相关文献

参考文献6

  • 1Banerjee A, Ghosh J. Clickstream Clustering Using Weighted Longest Common Subsequences[C]//Proc, of the 1st Web Mining Workshop Conference on Data Mining. Chicago, USA: [s. n. ], 2001.
  • 2Park S, Nallan C. Sequence-basedClustering for Web Usage Mining.. A New Experimental Framework ancl ANN-enhanced Kmeans Algorithm [J]. Data & Knowledge Engineering, 2008, 65(3): 512-543.
  • 3Ding Bolin, David L, Hail Jiawei. Efficient Mining of Closed Repetitive Gapped Subsequences from a Sequence Database[C]// Proc. of the 25th International Conference on Data Engineering. Shanghai, China: [s. n. ], 2009.
  • 4Fu Yongjian, Sandhu K. Clustering of Web Users Based on Access Patterns[C]//Proc. of 1999 KDD Workshop on Web Mining. [S. l. ]: IEEEPress, 1999.
  • 5方元康,胡学钢,夏启寿.Web日志预处理中优化的会话识别方法[J].计算机工程,2009,35(7):49-51. 被引量:11
  • 6Shahabi C, Kashani F B. Efficient and Anonymous Web Usage Mining for Web Personalization [J]. INFORMS Journal on Computing, 2003, 15(2) : 123-147.

二级参考文献5

  • 1Fu Yongjian, Sandhu K, Shih M A. Generalization-based Approach to Clustering of Web Usage Session[C]//Proc. of 1999 KDD Workshop Web Mining. [S. l.]: Springer-Verlag, 2000.
  • 2Cooley R, Mobasher B, Srivastava J. Data Preparation for Mining World Wide Web Browsing Patterns[J]. Knowledge and Information System, 1999, 1(1): 32-40.
  • 3Spiliopoulou M, Mobasher B, Berendt B, et al. Framework for the Evaluation of Session Reconstruction Heuristics in Web Usage Analysis[J]. Informs Journal of Computing, 2003, 15(2): 171-179.
  • 4Chen M S. Park J S. Yu P S. Data Mining for Path Traversal Patterns in a Web Environment[C]//Proc. of the 16th international Conf. on Distributed Computing System. [S. l.]: IEEE CS Press, 1996: 385-392.
  • 5杨怡玲,管旭东,尤晋元.Web日志挖掘预处理中的Frame页面过滤算法[J].计算机工程,2001,27(2):76-77. 被引量:14

共引文献10

同被引文献24

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部