期刊文献+

一种基于代表点的分布式数据流聚类算法 被引量:1

Representative-based distribute data stream clustering algorithm
下载PDF
导出
摘要 为发现分布式数据流下不同形状的聚簇,提出了一种基于代表点的聚类算法。算法首先在代表点定义的基础上,提出环点的概念以及迭代查找密度相连环点的算法,在此基础上生成远程站点的局部模型;然后在协调站点设计合并局部模型,生成全局聚簇的算法。通过真实数据集与仿真数据集的实验表明,算法使用代表点能够发现不同形状的聚簇并显著降低数据传输量,同时通过测试—更新局部模型算法避免了频繁发送数据。 To find the clusters of different shapes under the distributed data streams environment, this paper proposed the representative-based clustering algorithm. First, it presented the concept of circular-point based on the representative points and designed the iterative algorithm to find the density-connected circular-points, then generated the local model at the remote site. Secondly it designed the algorithm to generate global clusters by combining the local models at coordinator site. The experimental results on real and synthetic datasets demonstrate that the algorithm can find the clusters in different shapes and reduce the data transmission by using representative points, while avoiding frequently sending data through the test-update strategy.
出处 《计算机应用研究》 CSCD 北大核心 2012年第8期2845-2848,共4页 Application Research of Computers
基金 国家自然科学基金资助项目(61073043) 黑龙江省自然科学基金资助项目(F201023)
关键词 分布式数据流 数据挖掘 聚类 聚类演化 代表点 distributed data stream data mining clustering cluster evolving representative point
  • 相关文献

参考文献9

  • 1HAN Jia-wei, KAMBER M. Data mining: concepts and techniques [ M]. 2nd ed. San Fransisco:Morgan Kaufmann,2006:467-589.
  • 2ZHANG Qi, LIU Jin-ze, WANG Wei. Approximate clustering on dis- tributed data streams[ C ]//Proc of the 24th IEEE International Con- ference on Data Engineering. 2008 : 1131-1139.
  • 3HUANG Jiang-hua, ZHANG Jun-ying. Fuzzy C-means clustering algo- rithm with spatial constraints for distributed WSN data stream[ J]. In- ternational Journal of Advancements in Computing Technolo- gy, 2011,3(2) :165-175.
  • 4ZHOU Ao-ying, CAO Feng. Distributed data stream clustering: a fast EM-based approach[ C ]//Proc of the 23rd International Conference on Data Engineering. 2007.
  • 5GIBBONS P, TIRTHAPURA S. Estimating simple functions on the union of data streams [ C ]//Proc of ACM Symposium on parallel Al- gorithms and Architectures. 2001:281 - 291.
  • 6HUANG Jiang-hua, ZHANG Jun-ying. Distributed dual cluster algo- rithm based on grid for sensor streams[ J]. Journal of Digital Con- tent Technology and Its Applications ,2010,4 ( 9 ) :225- 233.
  • 7JANUZAJ E, KRIEGE HP, PFEIFLE M. Towards effect and efficient distributed clustering[ C]//Proc of the 3rd IEEE International Confe- rence on Data Mining. 2003.
  • 8ESTERM M, KRIEGE H P, SANDER J, et al. A density-based algo- rithm for discovering clusters in large spatial databases with noise [ C ]//Proc of the 2nd International Conference on Knowledge Disco- vering in Databases and Data Mining. Massachusetts: AAAI Press, 1996:226-232.
  • 9周水庚,周傲英,金文,范晔,钱卫宁.FDBSCAN:一种快速 DBSCAN算法(英文)[J].软件学报,2000,11(6):735-744. 被引量:42

二级参考文献6

  • 1Sheikholeslami G,Proceedings of the 2 4th VL DB Conference,1998年,428页
  • 2Zhang W,Proceedings of the 2 3rd VL DB Conference,1997年,186页
  • 3Chen M S,IEEE Transactions on Knowledge andData Engineering,1996年,8卷,6期,866页
  • 4Ester M,Proceedings of the 2nd International Conference on Knowledge Discovering in Data,1996年,226页
  • 5Zhang T,Proceedings of the ACM SIGMOD International Conference on Management of Data,1996年,103页
  • 6Ng R T,Proceedings of the2 0 th VL DB Conference,1994年,144页

共引文献41

同被引文献8

  • 1陈华辉.基于遗忘特性的数据流概要结构及其应用研究[D].上海:复旦大学博士学位论文,2008.
  • 2Zhou Ao-ying,Cao Feng. Distributed Data Stream Clustering: A Fast [M-based Approach[C. Proc of 23d International conference on Data Engineering, 2007.
  • 3Huang Jiang-hua,Zhang Jun ying. Distributed Dual cluster algorithm Based on Grid for Sensor Streams[J. JDCTA, 2010,4 (9).
  • 4Han J, Kamber. Data mining concepts and techniques[M. San Fransisco : Morgan Kaufmann, 2006.
  • 5Januzai E,Kriegel H P,Pfeifle M. Towards effective and efficient distributed clusteringC. Melbourne,FL Workshop on Clustering Large Data Sets, 2003.
  • 6Modha D,Spangler W. Feature Wighting in K-means Clustering[J]. Machine Learning,2003,52(3) :217-237.
  • 7张晨,金澈清,周傲英.一种不确定数据流聚类算法[J].软件学报,2010,21(9):2173-2182. 被引量:33
  • 8林秀丹,毛国君.基于密度网格的分布式数据流聚类算法[J].计算机工程,2012,38(16):70-73. 被引量:6

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部