期刊文献+

基于倾斜分布的变流速数据流聚类算法

Data Stream Clustering Algorithm with Variable Flow Rate Based on Skew Distribution
下载PDF
导出
摘要 处理倾斜分布特征的数据流聚类算法TDCA存在聚类速度与内存利用率上的不足,且变流速的数据流环境对聚类结果的质量有严重影响。针对上述问题,提出一种数据流聚类算法GR-Stream。采用网格单元作为数据点的聚集形式,以基于R-tree的扩展数据结构作为组织网格单元的索引结构,在此基础上引入剪枝策略,并调整数据点进入树的方式。在真实数据集KDD-CUP99上进行测试,结果表明,与TDCA算法相比,该算法在聚类过程中可以提高40%的访问速度,应用剪枝策略节省至少一半的内存使用量,同时在变流速的数据流环境下将聚类结果的平均纯度保持在90%以上。 The skew distribution characteristics of data stream clustering algorithm TDCA lack of clustering speed and memory utilization. Variable flow rate data stream environment has a serious impact on the quality of the clustering results. In order to deal with the above problems, a data stream clustering algorithm named GR-Stream is presented. It uses grid cells as the aggregation of data points, Based on an extension of the R-tree structure as the organization of grid cell index structure, it introduces pruning strategy on the basis of this structure, and adjusts the way of data points into the tree. It adopts the real dataset the KDD-CUP99 on algorithm test. Experimental results show that, compared with the TDCA algorithm data structure organizing data, this index structure can improve the clustering speed by 40%, and the application of pruning strategy to save at least half memory usage, at the same time maintaining more than 90% of the average purity of the clustering results in the variable flow rate of the data stream environment.
出处 《计算机工程》 CAS CSCD 2013年第12期247-250,259,共5页 Computer Engineering
关键词 数据流 聚类 时态密度 倾斜分布 剪枝 变流速 data stream clustering temporal density skew distribution pruning variable flow rate
  • 相关文献

参考文献13

  • 1Aggrawal C,Han Jiawei,Wang J,et al.A Framework for Clustering Evolving Data Streams[C]//Proc.of the 29th VLDB Conference.Berlin,Germany:IEEE Computer Society,2003.
  • 2Cao Feng,Ester M,Qian Weining,et al.Density-based Clustering over an Evolving Data Stream with Noise[C]//Proc.of SIAM International Conference on Data Mining.Bethesda,USA:Springer,2006.
  • 3Chen Yixin.Density-based Clustering for Real-time Stream Data[C]//Proc.of International Conference on Knowledge Discovery and Data Mining.Sacramento,USA:[s.n.],2007.
  • 4胡睿,林昭文,柯宏力,马严.一种基于密度和滑动窗口的数据流聚类算法[J].计算机科学,2011,38(5):145-148. 被引量:12
  • 5章季阳,王伦文.一种领域覆盖的数据流聚类算法[J].小型微型计算机系统,2012,33(9):1913-1916. 被引量:4
  • 6曹锋,周傲英.基于图形处理器的数据流快速聚类[J].软件学报,2007,18(2):291-302. 被引量:24
  • 7Ruiz C,Menasalvas E,Spiliopoulou M.C-DenStream:Using Domain Knowledge on a Data Stream[C]//Proc.of the 12th International Conference on Discovery Science.Porto,Portugal:Springer-Verlag,2009.
  • 8Antonellis P,Makris C,Tsirakis N.Algorithms for Clustering Clickstream Data[J].Information Processing Letters,2009,109(8):381-385.
  • 9杨宁,唐常杰,王悦,陈瑜,郑皎凌.一种基于时态密度的倾斜分布数据流聚类算法[J].软件学报,2010,21(5):1031-1041. 被引量:17
  • 10Kranen P,Assent I.Self-adaptive Anytime Stream Clustering[C]//Proc.of the 9th IEEE International Conference on Data Mining.[S.1.]:IEEE Computer Society,2009.

二级参考文献7

共引文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部