期刊文献+

基于数据概要描述的分布式数据流聚类模型与算法 被引量:4

Clustering Models and Algorithms for Distributed Data Streams Based on Data Synopsis
下载PDF
导出
摘要 数据流挖掘可有效解决大容量流式数据的知识发现问题,并已得到广泛研究。数据流的一个典型的例子是传感器采集的流式数据。然而,随着传感器网络的应用普及,这些流式数据在很多情况下是分布式采集和管理的,这就必然导致分布式地挖掘数据流的需求。分布式数据流挖掘的最大障碍是由分布式而导致的挖掘质量或者效率问题。为适应分布式数据流的聚类挖掘,探讨了分布式数据流的挖掘模型,并且基于该模型设计了对应的概要数据结构和关键的挖掘算法,给出了算法的理论评估或者实验验证。实验说明,提出的模型和算法可以有效地减少数据通信代价,并且能保证较高的全局模式的聚类质量。 Mining data streams aims at discovering knowledge from a large of streaming data, in which enough efforts have been done in recent years. As a typical example, the data to be collected by a sensor is a format of data streams. However,in the technical environment of a sensor network, multiple sensors always are set and they collect data in a distributed way, so mining data streams with a distributed way is making a challenge issue. Most ongoing studies for mining distributed data streams are suffering from the problems of accuracy or efficiency. In this paper, the model for clustering a distributed data stream was discussed, including a new synopsis data structure for summarizing data streams and some effective algorithms for key mining phases. The reasons of presented algorithms were also discussed. Experi- mental results demonstrate that presented models and algorithms have less transmission cost and higher clustering qua- lity to mine the global pattern from distributed data streams.
出处 《计算机科学》 CSCD 北大核心 2013年第6期187-191,202,共6页 Computer Science
基金 国家自然科学基金项目(62173293) 中央财经大学教改项目基金资助
关键词 分布式数据流 数据概要 增量式聚类 全局模式 Distributed data stream, Data synopsis, Incremental clustering, Global pattern
  • 相关文献

参考文献17

  • 1Babcock B,Babu S,Datar M.Models and issues in data stream systems[C]// Proceedings of the 21 st ACM Symposium on Principles of Database Systems.Madison,WI,USA:ACM,2002:1-16.
  • 2Khalilian M,Mustapha N.Data stream clustering:challenges and issues[C]//Proceedings of 2010 International MultiConference of Engineering and Computer Scientists.Hong Kong,China:Newswood Limited International Association of Engineers,2010:566-569.
  • 3Rajasegarar S,Leckie C,Palaniswami M.Distributed anomaly detection in wireless sensor networks[C]//Proceedings of the 10th IEEE Singapore International Conference on Communications Systems.Singapore,IEEE,2006:1-5.
  • 4Zhang Q,Liu J,Wang W.Approximate clustering on distributed data streams[C]//Proceedings of IEEE 24th International Conference on Data Engineering.Cancun,Mexico:IEEE,2008:1131-1139.
  • 5Graham C,Muthukrishnan S,Zhuang W.Conquering the divide:continuous clustering of distributed data streams[C]//Proceedings of the 23rd International Conference on Data Engineering.Istanbul,Turkey:IEEE,2007:1036-1045.
  • 6Hajiee M.A new distributed clustering algorithm based on Kmeans algorithm[C]//Proceedings of the 3rd International Conference on Advanced Computer Theory and Engineering Piscataway.NJ,USA:IEEE,2010:2408-2411.
  • 7Januzai E,Kriegel H P,Pfeifle M.DBDC:density based distributed clustering[C]//Proceedings of Advances in Database Technology-EDBT 2004 9th International Conference on Extending Database Technology.Berlin,Germany:IEEE,2004:88-105.
  • 8Johnson E,Kargupta H.Collective,Hierarchical clustering from distributed,heterogeneous data[C]//Proceedings of 2000 LargeScale P arallel Data Mining.London,UK:Springer-Verlag,2000:221-244.
  • 9Domingos P,Hulten G.Mining high-speed data streams[C]//Proceedings of KDD-2000 Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Boston,MA,USA:IEEE,2000:71-80.
  • 10Zhang T,Raghu R,Livny M.BIRCH:an efficient data clustering method for very large databases[J].Sigmod Record,1996,25(2):103-114.

二级参考文献41

共引文献60

同被引文献35

  • 1王明珠,王莉华.基于聚类分析的我国各地区综合发展能力评价[J].辽宁石油化工大学学报,2013,33(4):105-108. 被引量:2
  • 2Handl J, Knowles J. An evolutionary approach to multiohjec- tire clustering [J]. IEEE Transactions on Evolutionary Com- putation, 2007, 11 (1): 56-76.
  • 3Saha S, Bandyopadhyay S. A symmetry based multiobjective clustering technique for automatic evolution of clusters [J]. Pattern Recognition, 2010, 43 (3): 738-751.
  • 4Qian Xiaoxue, Zhang Xianrong, Jiao Licheng, et al. Unsu- pervised texture image segmentation using multiobjective evolu- tionary clustering ensemble algorithm [C] //IEEE Congress on Evolutionary Computation. Piscataway, NJ, USA: IEEE, 2008: 3561-3567.
  • 5Zhu Lin, Cao Longbing, Yang Jie. Multiobjective evolutionary algorithm-based soft subspace clustering [C] //IEEE Congress on Evolutionary Computation. NY, USA: IEEE, 2012.
  • 6Strehl A, Ghosh J. Cluster ensembles: A knowledge reuseframework for combining multiple partitions [J]. Journal of Machine Learning Research, 2008, 3 (3): 583-617.
  • 7Deb K, Pratap A, Agarwal S, et al. A fast and elitist mul- tiobjective genetic algorithm: NSGA-II [J]. IEEE Transac- tions on Evolutionary Computation, 2002, 6 (2) : 182-197.
  • 8University of CaliTomia, Irvine. UCI machine learning reposi- tory [EB/OL]. [2013-09- 20]. http://archive, ics. uci. edu/ ml/datasets, html.
  • 9Yang J.Dynamic clustering of evolving streams with a single pass[C].In:Proc.of IEEE International Conference Data Mining(ICDE′09).Washington:IEEE Computer Society,2009:695-697.
  • 10Beringer J,Hullermeier E.Online clustering of parallel data streams[J].Data&Knowledge Engineering,2006,58(2):180-204.

引证文献4

二级引证文献7

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部