一种基于代表点的分布式数据流聚类算法被引量：1

Representative-based distribute data stream clustering algorithm

下载PDF

导出

摘要为发现分布式数据流下不同形状的聚簇,提出了一种基于代表点的聚类算法。算法首先在代表点定义的基础上,提出环点的概念以及迭代查找密度相连环点的算法,在此基础上生成远程站点的局部模型;然后在协调站点设计合并局部模型,生成全局聚簇的算法。通过真实数据集与仿真数据集的实验表明,算法使用代表点能够发现不同形状的聚簇并显著降低数据传输量,同时通过测试—更新局部模型算法避免了频繁发送数据。 To find the clusters of different shapes under the distributed data streams environment, this paper proposed the representative-based clustering algorithm. First, it presented the concept of circular-point based on the representative points and designed the iterative algorithm to find the density-connected circular-points, then generated the local model at the remote site. Secondly it designed the algorithm to generate global clusters by combining the local models at coordinator site. The experimental results on real and synthetic datasets demonstrate that the algorithm can find the clusters in different shapes and reduce the data transmission by using representative points, while avoiding frequently sending data through the test-update strategy.

作者高兵张健沛杨静

机构地区哈尔滨工程大学计算机科学与技术学院大连东软信息学院计算机系

出处《计算机应用研究》 CSCD 北大核心 2012年第8期2845-2848,共4页 Application Research of Computers

基金国家自然科学基金资助项目(61073043) 黑龙江省自然科学基金资助项目(F201023)

关键词分布式数据流数据挖掘聚类聚类演化代表点 distributed data stream data mining clustering cluster evolving representative point

分类号 TP311 [自动化与计算机技术—计算机软件与理论]

引文网络
相关文献

参考文献9

1HAN Jia-wei, KAMBER M. Data mining: concepts and techniques [ M]. 2nd ed. San Fransisco:Morgan Kaufmann,2006:467-589.
2ZHANG Qi, LIU Jin-ze, WANG Wei. Approximate clustering on dis- tributed data streams[ C ]//Proc of the 24th IEEE International Con- ference on Data Engineering. 2008 : 1131-1139.
3HUANG Jiang-hua, ZHANG Jun-ying. Fuzzy C-means clustering algo- rithm with spatial constraints for distributed WSN data stream[ J]. In- ternational Journal of Advancements in Computing Technolo- gy, 2011,3(2) :165-175.
4ZHOU Ao-ying, CAO Feng. Distributed data stream clustering: a fast EM-based approach[ C ]//Proc of the 23rd International Conference on Data Engineering. 2007.
5GIBBONS P, TIRTHAPURA S. Estimating simple functions on the union of data streams [ C ]//Proc of ACM Symposium on parallel Al- gorithms and Architectures. 2001:281 - 291.
6HUANG Jiang-hua, ZHANG Jun-ying. Distributed dual cluster algo- rithm based on grid for sensor streams[ J]. Journal of Digital Con- tent Technology and Its Applications ,2010,4 ( 9 ) :225- 233.
7JANUZAJ E, KRIEGE HP, PFEIFLE M. Towards effect and efficient distributed clustering[ C]//Proc of the 3rd IEEE International Confe- rence on Data Mining. 2003.
8ESTERM M, KRIEGE H P, SANDER J, et al. A density-based algo- rithm for discovering clusters in large spatial databases with noise [ C ]//Proc of the 2nd International Conference on Knowledge Disco- vering in Databases and Data Mining. Massachusetts: AAAI Press, 1996:226-232.
9周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量：42

二级参考文献6

1Sheikholeslami G，Proceedings of the 2 4th VL DB Conference，1998年，428页
2Zhang W，Proceedings of the 2 3rd VL DB Conference，1997年，186页
3Chen M S，IEEE Transactions on Knowledge andData Engineering，1996年，8卷，6期，866页
4Ester M，Proceedings of the 2nd International Conference on Knowledge Discovering in Data，1996年，226页
5Zhang T，Proceedings of the ACM SIGMOD International Conference on Management of Data，1996年，103页
6Ng R T，Proceedings of the2 0 th VL DB Conference，1994年，144页

共引文献41

1刘嘉嘉,杜习英.一种新的基于密度的自适应取样聚类算法[J].电脑知识与技术（过刊）,2007(2):478-480.
2张海龙,王仁彪,聂俊,刘进忠.海量数据的网格启发信息密度聚类算法[J].吉林大学学报（工学版）,2011,41(S2):254-258. 被引量：2
3宋明,刘宗田.基于数据交叠分区的并行DBSCAN算法[J].计算机应用研究,2004,21(7):17-20. 被引量：9
4陈燕俐,洪龙,金达文,朱梧槚.一种简单有效的基于密度的聚类分析算法[J].南京邮电学院学报（自然科学版）,2005,25(4):24-29. 被引量：8
5何中胜,刘宗田,庄燕滨.基于数据分区的并行DBSCAN算法[J].小型微型计算机系统,2006,27(1):114-116. 被引量：16
6张枫,邱保志.基于网格的高效DBSCAN算法[J].计算机工程与应用,2007,43(17):167-169. 被引量：8
7卢炎生,娄强.障碍空间里基于密度的快速聚类算法[J].小型微型计算机系统,2007,28(11):1976-1980. 被引量：4
8王翠茹,朵春红.一种改进的基于密度的DBSCAN聚类算法[J].广西师范大学学报（自然科学版）,2007,25(4):104-107. 被引量：4
9胡学钢,王东波,吴共庆.一种基于层次树的高效密度聚类算法[J].合肥工业大学学报（自然科学版）,2008,31(2):187-190. 被引量：4
10蔡永旺,杨炳儒.适用于公交站点聚类的DBSCAN改进算法[J].计算机工程,2008,34(10):190-192. 被引量：3

同被引文献8

1陈华辉.基于遗忘特性的数据流概要结构及其应用研究[D].上海:复旦大学博士学位论文,2008.
2Zhou Ao-ying,Cao Feng. Distributed Data Stream Clustering: A Fast [M-based Approach[C. Proc of 23d International conference on Data Engineering, 2007.
3Huang Jiang-hua,Zhang Jun ying. Distributed Dual cluster algorithm Based on Grid for Sensor Streams[J. JDCTA, 2010,4 (9).
4Han J, Kamber. Data mining concepts and techniques[M. San Fransisco : Morgan Kaufmann, 2006.
5Januzai E,Kriegel H P,Pfeifle M. Towards effective and efficient distributed clusteringC. Melbourne,FL Workshop on Clustering Large Data Sets, 2003.
6Modha D,Spangler W. Feature Wighting in K-means Clustering[J]. Machine Learning,2003,52(3) :217-237.
7张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(9):2173-2182. 被引量：33
8林秀丹,毛国君.基于密度网格的分布式数据流聚类算法[J].计算机工程,2012,38(16):70-73. 被引量：6

引证文献1

1陈春燕,吕俊龙,郭有强.基于时间衰减的分布式数据流聚类算法[J].太原师范学院学报（自然科学版）,2013,12(2):87-90. 被引量：1

二级引证文献1

1呼妮,王勇.一种改进的基于反k近邻的流数据离群点检测算法[J].计算机与现代化,2016(8):32-35. 被引量：1

1吴学雁,黄道平.基于形态特征的数据流聚类方法研究[J].计算机工程,2011,37(13):46-48. 被引量：3
2潘剑飞,徐丽丽,董一鸿.动态社区演化研究进展[J].电信科学,2017,33(1):24-33. 被引量：2
3王海波,张长井.单片机控制LED循环点亮试验在Proteus虚拟环境中的实现[J].宿州教育学院学报,2012,15(4):91-92.
4赵明,杜坚,秦连升.一种基于单片机的交通灯控制系统[J].信息通信,2014,27(1):82-82. 被引量：2
5陈晋音,何辉豪.基于密度和混合距离度量方法的混合属性数据聚类研究[J].控制理论与应用,2015,32(8):993-1002. 被引量：11
6颜昌彬.基于单片机的交通信号灯自动控制系统设计[J].科技创新与应用,2014,4(35):36-36.
7李宇,丁艳.PLC编程中SFC图转换到梯形图的方法探讨[J].现代制造技术与装备,2016,52(9):100-101. 被引量：1
8易小琳,刘旭辉.基于时空混沌同步的同步密钥生成算法[J].北京工业大学学报,2013,39(5):696-699. 被引量：1
9岳晓礼.基于PLC顺序工作的三种编程方法[J].机床电器,2011,38(4):22-23.
10张克旬,赵永礼,李开.基于交通灯控制系统设计[J].中小企业管理与科技,2015(19):197-198.

计算机应用研究

2012年第8期

浏览历史

内容加载中请稍等...

一种基于代表点的分布式数据流聚类算法被引量：1

参考文献9

二级参考文献6

共引文献41

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于代表点的分布式数据流聚类算法 被引量：1

参考文献9

二级参考文献6

共引文献41

同被引文献8

引证文献1

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于代表点的分布式数据流聚类算法被引量：1