期刊文献+

基于仿射传播的进化数据流在线聚类算法 被引量:5

Online Clustering of Evolution Data Stream Based on Affinity Propagation Clustering
下载PDF
导出
摘要 为提高数据流聚类的精度和时效性,提出一种具有时态特征与近邻传播思想的高效数据流聚类算法(TCAPStream).该算法利用改进的WAP将新检测到的类模式合并到聚类模型中,同时利用微簇时态密度表征数据流的时态演化特征,并提出在线动态删除机制对微簇进行维护,使算法模型既能体现数据流的时态特征,又能反映数据流的分布特性,得到更精确的聚类结果.实验结果表明,该算法在多个人工数据集和真实数据集上不仅具有良好的聚类效果,而且具有较好的伸缩性和可扩展性. To improve the accuracy and timeliness of data stream clustering, an efficient data stream clustering algorithm is proposed with temporal characteristics and affinity propagation methods ( TCAPStream) . The algorithm merges the newly detected class mode into clustering model by using the improved WAP algorithm, meanwhile, the temporal evolution characteristic of the data stream is reflected by using micro-cluster temporal density. Besides, the online dynamic deletion mechanism is proposed to maintain the micro-clusters. It makes the algorithm model reflecting both temporal and distribution characteristics of data stream to obtain more accurate clustering results. The experimental results show that the proposed algorithm not only has good clustering effect in several artificial datasets and real datasets, but also has good flexibility and extensibility.
出处 《模式识别与人工智能》 EI CSCD 北大核心 2014年第5期443-451,共9页 Pattern Recognition and Artificial Intelligence
基金 国家高技术研究发展863计划项目(No.2011AA010603 2011AA010605)资助
关键词 数据挖掘 近邻传播聚类 时态密度 模型重建 数据流 Data Mining Affinity Propagation Clustering Temporal Density Model Reconstruction Data Stream
  • 相关文献

参考文献15

  • 1Ntoutsi I, Zimek A, Palpanas T, et al. Density-Based Projected Clustering over High Dimensional Data Streams// Proc of the 12th SIAM International Conference on Data Mining. Anaheim, USA, 2012 : 987-998.
  • 2Kranen P, Kremer H, Jansen T, et al. Stream Data Mining Using the MOA Framework//Proc of the 17th International Conference on Database Systems for Advanced Applications. Busan, Republic of Korea, 2012 : 309-313.
  • 3Halkidi M, Koutsopoulos I. Online Clustering of Distributed Strea ing Data Using Belief Propagation Techniques// Proc of the 17 IEEE International Conference on Mobile Data Management. Lul1 Sweden, 2011, I: 216-225 |.
  • 4Aggarwal C C, Han J W, Wang J Y, eta/. A Framework for Cluste- ring Evolving Data Streams//Proc of the 29th International Confer- ence on Very Large Data Bases. Berlin, Germany, 2003:81-92.
  • 5Aggarwal C C, Han J W, Wang J Y, et al. A Framework for Projec- ted Clustering of High Dimensional Data Streams//Proc of the 30th International Conference on Very Large Data Bases. Toronto, Cana- da, 2004:852-863.
  • 6Cao F, Ester M, Qian W N, et al. Density-Based Clustering over an Evolving Data Stream with Noise/! Proc of the 6th SIAM Interna- tional Conference on Data Mining. Bethesda, USA, 2006: 328- 339.
  • 7Chen Y X, Tu L. Density-Based Clustering for Real-Time Stream Data//Proc of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Jose, USA, 2007: 133-142.
  • 8Zhang X L, Furtlehner C, Sebag M. Data Streaming with Affinity Propagation//Proc of the European Conference on Machine Learn- ing and Knowledge Discovery in Databases. Antwerp, Belgium, 2008 : 628-643.
  • 9杨宁,唐常杰,王悦,陈瑜,郑皎凌.一种基于时态密度的倾斜分布数据流聚类算法[J].软件学报,2010,21(5):1031-1041. 被引量:17
  • 10Zhang X L, Furtlehner C, Perez J, et al. Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming// Proc of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Paris, France, 2009:987-996.

二级参考文献14

  • 1金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 2Muthukrishnan S.Data Streams:Algorithms and Applications.Hanover,MA,USA:Now Publishers Inc.,2005
  • 3Golab L,Ozsu M T.Issues in data stream management.SIGMOD Record,2003,32(2):5-14
  • 4Garofalakis M N,Gehrke J.Querying and mining data streams:You only get one look//Proceedings of the 28th International Conference on Very Large Data Bases.Hong Kong,China,2002:635-635
  • 5Gaber M M,Zaslavsky A B,Krishnaswamy S.Mining data streams:A review.SIGMOD Record,2005,34(2):18-26
  • 6Guha S,Meyerson A,Mishra N,Motwani R,O'Callaghan L.Clustering data streams:Theory and practice.IEEE Transactions on Knowledge and Data Engineering,2003,15(3):515-528
  • 7Aggarwal C C,Han Jia-Wei,Wang Jian-Yong,Yu P S.A framework for clustering evolving data streams//Proceedings of the 29th International Conference on Very Large Data Bases.Berlin,Germany,2003:81-92
  • 8Aggarwal C C,Han Jiawei,Wang Jianyong,Yu P S.A framework for projected clustering of high dimensional data streams//Proceedings of the 30th International Conference on Very Large Data Bases.Toronto,Canada,2004:852-863
  • 9Aggarwal C C,Yu P S.A framework for clustering massive text and categoncal data streans//Proceedings of the 6th SIAM International Conference on Data Mining.Bethesda,MD,USA,2006:477-481
  • 10Ong K L,Li Wen-Yuan,Ng W K,Lim E P.SCLOPE:An algorithm for clustering data streams of categorical attributes//Proceedings of the 6th International Conference on Data Warehousing and Knowledge Discovery.Zaragoza,Spain,2004:209-218

共引文献37

同被引文献31

引证文献5

二级引证文献23

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部