期刊文献+

分布式数据流聚类算法 被引量:2

Clustering algorithm over distributed data stream
下载PDF
导出
摘要 针对分布式数据流中数据有交叠、不完整的情况和聚类需要较低通信代价的要求,提出了密度和模型聚类思想相结合的分布式数据流聚类算法DAM-Distream。该算法利用混合高斯模型描述数据流的分布概况,可以有效压缩数据量并能较好的反映分布数据流间的交叠性。由于获得模型参数的EM算法对初值敏感,应用Hoeffding界理论和基于密度的算法对数据流进行初聚类,得到比较准确的初始参数,最后采用合并近似模型策略获得全局模型。仿真实验结果表明,DAM-Distream能有效克服EM算法的缺点,获得的模型参数性能更优,在降低系统的通信代价的同时能提高分布式环境下数据流的聚类质量。 According to the condition that there are some overlap and missing data in distributed data streams, and to meet the needs of lower communication costs, DAM-Distream, a clustering algorithm combining density method and model method is proposed. The algorithm uses the Ganssian mixture model to describe the data streams flowing into the local distribution sites. However, Gaussian mixture model parameters are obtained by EM algorithm which is sensitive to initial value. DAM-Distream presents density based algorithm to cluster data streams at first, that is, to search the suitable initial parameters for Gaussian mixture model. Second, EM algorithm is used to iterative clustering, and then the algorithm determines. At last, the models are uploaded to the central site for the integrated treatment. Experimental results show that DAM-Distream can effectively overcome the shortcomings of the EM algorithm and obtain better parameters of GMM. Experiment show that it can improve the clustering quality of data streams in distributed systems and reduce the eommunl- cation cost of the system.
出处 《计算机工程与设计》 CSCD 北大核心 2011年第8期2708-2711,2763,共5页 Computer Engineering and Design
基金 国家863高技术研究发展计划基金项目(2008AA011001)
关键词 分布式数据流 聚类 基于密度 基于模型 数据挖掘 distributed data streams clustering density-based model- based data mining
  • 相关文献

参考文献10

  • 1康晶,马宏,刘力雄.基于密度的优化数据流聚类算法[J].计算机工程与设计,2010,31(22):4756-4759. 被引量:3
  • 2周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011. 被引量:9
  • 3岳佳,王士同.高斯混合模型聚类中EM算法及初始化的研究[J].微计算机信息,2006,22(11X):244-246. 被引量:51
  • 4ZHANG Xiang liang,,Cyril FURTLEHNER,Mich le SEBAG.Distributed and incremental clustering based on weighted affi-nity propagation[].Proceedings of the Fourth Starting AI Resea-rchers’’Symposium.2008
  • 5Manjhi A,Shkapenyuk V,Dhamdhere K,et al.Finding(recently)frequent items in distributed data streams[].Proc of thest Int Conf on Data Engineering.2005
  • 6Januzaj E,Kriegel H P,Pfeifle M.Towards effective and effi-cient distributed clustering[].Proceeding of International Work-shop on Clustering Large Data Setsrd IEEEInternational Con-ference on Data Mining(ICDM).2003
  • 7Cao F,Estery M,Qian W.Density-based Clustering over an Evolving Data Stream with Noise[].Proceedings of the SIAM Conference on Data Mining (SDM’’).2006
  • 8CHERTY,TUL.Density-based clustering for real-time stream data[].Proceedings of the th ACM SIGKDD International Confer-ence on Knowledge Discovery and Data Mining.2007
  • 9FREY B,DUECK D.Clustering by passing messages betweendata points[].Science.2007
  • 10Reynolds DA.Speaker Identification and Verification Using Gaussian Mixture Speaker Models[].Speech Communication.1995

二级参考文献22

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:51
  • 3周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011. 被引量:9
  • 4Babcock S Babu,M Datar,et al.Models and issues in data stream systems[C].In:Proc of the 21st ACM Symp on Principles of Database Systems.New York:ACM Press,2002.1-16
  • 5S Guha,N Mishra,R Motwani,et al.Clustering data streams:Theory and practice[J].IEEE TKDE Special Issue on Clustering,2003,3(2):37-46
  • 6C Aggarwal,J Han,J Wang,et al.A framework for clustering evolving data streams[C].In:Proc of the 29th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann,2003.81-92
  • 7C Aggarwal,J Han,J Wang,et al.A framework for projected clustering of high dimensional data streams[C].In:Proc of the 30th Int'l Conf on Very Large Data Base.San Francisco:Morgan Kaufmann,2004.852-863
  • 8O Nasraoui,C C Uribe,C R Coronel.TECNO-STREAMS:Tracking evolving clusters in noisy data streams with a scalable immune system learning model[C].In:Proc of the 3rd IEEE Int'l Conf on Data Mining.Los Alamitos,CA:IEEE Computer Society Press,2003.19-22
  • 9孙焕良 赵法信 鲍玉斌 等.CD—Stream——一种基于空间划分的流数据密度聚类算法[J].计算机研究与发展,2004,41:289-294.
  • 10C Aggarwal,J Han,J Wang,et al.On demand classification of data streams[C].In:Proc of the 10th ACM SIGKDD Int'l Conf on Knowledge Discovery and Data Mining.New York:ACM Press,2004.503-508

共引文献60

同被引文献19

  • 1孙玉芬,卢炎生.流数据挖掘综述[J].计算机科学,2007,34(1):1-5. 被引量:36
  • 2范明,孟小峰.数据挖掘概念与技术[M].2版.北京:机械工业出版社,2007:195-196.
  • 3胡仲义,郭超,王永炎,等.基于时间衰减和特征变量的数据流聚类算法[J].计算机研究与发展,2012,49(S1):155-162.
  • 4NTOUTSI I, ZIMEK A, PALPANAS T, et al. Density-based pro- jected clustering over high dimensional data streams[ C]// Proceed- ings of the 6th International Conference on Scalable Uncertainty Management, LNCS 7520. Piscataway, NJ: IEEE Press, 2012:311 - 324.
  • 5GAO B, ZHANG J. Density based distribute data stream clustering algorithm[J]. Journal of Software, 2013, 8(2) : 435 -442.
  • 6HUANG J H, ZHANG J Y. Fuzzy C-means clustering algorithm with spatial constraints for distributed WSN data stream[ J]. International Journal of Advancements in Computing Technology, 2011,3 (2) : 165 - 175.
  • 7SAMATOVA N F, GEIST A, OSTROUCHOV G, et al. Parallel out-of-core algorithm for genome-scale enumeration of metabolic sys- temic pathways[ C]//IPDPS 2002: Proceedings of the 16th Interna- tional Parallel and Distributed Processing Symposium. Washington, DC: 1EEE Computer Society, 2002: 249.
  • 8JANUZAJ E, KRIEGEL H P, PFEIFLE M. DBDC: Density based distributed clustering[ C]//Advances in Database Technology-EDBT 2004. Berlin: Springer, 2004: 88- 105.
  • 9JANUZAJ E, KR1EGEL H P, PFEIFLE M. Scalable density-based distributed clustering[ C]//Knowledge Discovery in Databases: PK- DD2004. Berlin: Springer, 2004:231-244.
  • 10ZHOU A, CAO F, YAN Y, et al. Distributed data stream cluste- ring: a fast EM-based approach[ C]// ICDE 2007: Proceedings of the 23rd IEEE International Conference on Data Engineering. Pisea- taway, NJ: IEEE Press, 2007:736-745.

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部