期刊文献+

基于时间衰减的分布式数据流聚类算法 被引量:1

Time-Attenuation-Based Distributed Data Stream Clustering Algorithm
下载PDF
导出
摘要 为了发现分布式数据流环境下的微簇,针对数据流的遗忘特性,提出一种基于时间衰减的数据流聚类算法.根据衰减模型增量式的处理局部站点,将局部模型发送给中心站点.中心站点对局部站点的微簇进行合并,生成全局聚类模型.通过真实数据和仿真数据的实验表明,该算法能够得到较好的聚类质量,并且有较好的伸缩性. To find the clusters under the distributed data stream environment,in allusion to amnesic features of data stream, this paper proposed the time-attenuation-based data stream clustering algorithm. The algorithm updates incremental local model by attenuation function. Center site receives local summary structure and merges clusters from local site to get eventual clustering model. The experimental results on real and synthetic datasets demonstrate that the al- gorithm can get better clustering quality,and has good scalability.
出处 《太原师范学院学报(自然科学版)》 2013年第2期87-90,共4页 Journal of Taiyuan Normal University:Natural Science Edition
基金 安徽省优秀青年人才基金项目(2010SQRL126) 安徽省自然科学基金项目(11040606M151) 蚌埠学院自然科学基金项目(2011ZR11)
关键词 分布式数据流 聚类 时间衰减 滑动窗口 distributed data stream clustering time-recession sliding windows
  • 相关文献

参考文献9

  • 1张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(9):2173-2182. 被引量:33
  • 2陈华辉.基于遗忘特性的数据流概要结构及其应用研究[D].上海:复旦大学博士学位论文,2008.
  • 3Zhou Ao-ying,Cao Feng. Distributed Data Stream Clustering: A Fast [M-based Approach[C. Proc of 23d International conference on Data Engineering, 2007.
  • 4Huang Jiang-hua,Zhang Jun ying. Distributed Dual cluster algorithm Based on Grid for Sensor Streams[J. JDCTA, 2010,4 (9).
  • 5林秀丹,毛国君.基于密度网格的分布式数据流聚类算法[J].计算机工程,2012,38(16):70-73. 被引量:6
  • 6Han J, Kamber. Data mining concepts and techniques[M. San Fransisco : Morgan Kaufmann, 2006.
  • 7Januzai E,Kriegel H P,Pfeifle M. Towards effective and efficient distributed clusteringC. Melbourne,FL Workshop on Clustering Large Data Sets, 2003.
  • 8Modha D,Spangler W. Feature Wighting in K-means Clustering[J]. Machine Learning,2003,52(3) :217-237.
  • 9高兵,张健沛,杨静.一种基于代表点的分布式数据流聚类算法[J].计算机应用研究,2012,29(8):2845-2848. 被引量:1

二级参考文献36

  • 1周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42
  • 2孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量:36
  • 3Babcock B,Babu S,Datar M,Motwani R,Widom J.Models and issues data stream systems.In:Proc.of the 21st ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.Madison:ACM,2002.1-16.
  • 4Aggarwal CC,Han JW,Yu PS.A framework for clustering evolving data streams.In:Proc.of the 29th Int'l Conf.on Very Large Data Bases.Berlin:Morgan Kaufmann Publishers,2003.81-92.
  • 5Aggarwal CC,Yu PS.A framework for clustering uncertain data streams.In:Proc.of the 24th Int'l Conf.on Data Engineering.Cancún:IEEE,2008.150-159.
  • 6Callaghan LO,Mishra N,Meyerson A,Guha S,Motwani R.Streaming-Data algorithms for high-quality clustering.In:Proc.of the 18th Int'l Conf.on Data Engineering.San Jose:IEEE,2002.685-694.
  • 7Zhu WH,Yin J,Xie YH.Arbitrary shape cluster algorithm for clustering data stream.Journal of Software,2006,17(3):379-387 (in Chinese with English abstract).http://www.jos.org.cn/1000-9825/17/379.htm[doi:10.1360/jos170379].
  • 8Datar M,Gionis A,Indyk P,Motwani R.Maintaining stream statistics over sliding windows.In:Proc.of the 13th Annual ACM-SIAM Symp.on Discrete Algorithms.San Francisco:ACM,2002.635-644.
  • 9Babcock B,Datar M,Motwani R,Callaghan LO.Maintaining variance and k-medians over data stream windows.In:Proc.of the 22nd ACM SIGACT-SIGMOD-SIGART Symp.on Principles of Database Systems.San Diego:ACM,2003.234-243.
  • 10Cao F,Estery M,Qian WN,Zhou AY.Density-Based clustering over an evolving data stream with noise.In:Proc.of the 6th SIAM Int'l Conf.on Data Mining.Bethesda:SIAM,2006.326-337.

共引文献37

同被引文献17

  • 1倪巍伟,陆介平,陈耿,孙志挥.基于k均值分区的数据流离群点检测算法[J].计算机研究与发展,2006,43(9):1639-1643. 被引量:20
  • 2常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量:61
  • 3Ng R T, Han Jiawei. Efficient and effective clustering meth- ods for spatial data mining[C]//Proceedings of the 20th In- ternational Conference on Very Large Data Bases. 1994:144- 155.
  • 4He Zengyou, Xu Xiaofei, Huang Zhexue, et al. FP-outli- er : Frequent pattern based outlier detection [ J ]. Computer Science and Information Systems, 2005,2 ( 1 ) : 103-118.
  • 5Zhang Tian, Ramakrishnan R, Livny M. BIRCH: An effi- cient data clustering method for very large databases [ C ]// Proceedings of the 1996 ACM SIGMOD International Con- ference on Management of Data. 1996 : 103-114.
  • 6Han Jiawei, Kamber M. Data Mining: Concepts and Tech- niques [ M ]. 2nd Edition. San Francisco : Morgan Kauf- mann, 2006.
  • 7Marateb H R, Rojas-Martinez M, Mananas Villanueva M A, et al. Robust outlier detection in high-density surface electromyographic signals [ C]// Proceedings of the 2010 Annual International Conference of the IEEE Engineering in Medicine and Bioloav. 2010:4850-4853.
  • 8Guha S, Mishra N, Motwani R, et al. Clustering data streams[ C ]// Proceedings of the d lst Annual Symposium on Foundations of Computer Science. 2000:359-366.
  • 9O' Callaghan L, Mishra N, Meyerson A, et al. Streaming- data algorithms for high-quality clustering [ C ]// Proceed- ings of the 18th IEEE International Conference on Data En- gineering. 2002:685-694.
  • 10Zhou Aoying, Cao Feng, Yah Ying, et al. Distributed data stream clustering: A fast EM-based approach [ C ]//Pro- ceedings of the 23rd IEEE International Conference on Data Engineering. 2007:736-745.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部