期刊文献+

基于引力相似度和相对密度的不确定数据流聚类 被引量:5

Uncertain Data Stream Clustering Algorithm Based on Gravity Similarity and Relative Density Techniques
下载PDF
导出
摘要 针对不确定数据流聚类问题,提出一种基于引力相似度和相对密度的聚类算法.采用在线/离线两阶段处理框架,综合考虑元组之间的相似度与元组自身的不确定性,利用引力相似度为每个不断到达的数据元组寻找可能归属的微簇,以新的离群点处理和在线维护机制来适应数据流的演化情况,并在离线层使用相对密度算法进行聚类,不需要预先指定聚类数且可处理任意形状的微簇.实验结果表明,与现有的聚类方法相比,所提出的算法具有更高的聚类质量和准确度. For the issue of uncertain data stream clustering,an effective clustering algorithm based on gravity similarity and relative density technique was proposed in this paper.The algorithm adopted an online/offline two-stage processing framework and considered simularity and data uncertainty together to measure the clustering quality.For each incoming tuples,it used gravity similarity to find the possible micro-cluster.Besides,a novel outlier processing and online maintenance mechanism were developed to adapt to the evolution of the data stream.At the offline stage,it used a relative density clustering algorithm to handle arbitrary shape micro clusters.The experimental results show that the proposed algorithm outperforms existing methods in quality and accuracy.
作者 郑祺 黄德才
出处 《上海交通大学学报》 EI CAS CSCD 北大核心 2016年第6期873-878,共6页 Journal of Shanghai Jiaotong University
基金 水利部公益性行业科研专项项目(201401044)资助
关键词 不确定数据流 聚类 引力 相似度 相对密度 离群点 uncertain data stream clustering gravily similarity related density outlier
  • 相关文献

参考文献13

  • 1BABCOCK B, BABU S, DATAR M, et al. Models and issues in data stream systems [C] // Procnf the 21st ACM Symp on Principles of Database Systems. Madison: ACM, 2002:1 16.
  • 2CORMODE G, GAROFALAKIS M. Sketching probabilistic data streams[C]//Procof the ACM sig- modIntConfon Management of Data. Beijing: ACM, 2007:281-292.
  • 3JAYRAM T S, MCGREGOR A, MUTHUKRISH NAN S, et al. Estimating statistical aggregates on probabilistic data streams[C]//Acre Trans Database Syst Association for Computing Machinery. New York, USA: ACM, 2007:133-135.
  • 4JAYRAM T S, KALE S, VEE E. Efficient aggrega tion algorithms for probabilistic data. [C] // Proceed- ings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007. New Orleans, Louisiana, USA: SIAM, 2007:346-355.
  • 5AGGARWAL C C, YU P S. A Survey of uncertain data algorithms and applications[J]. IEEE Transac- tions on Knowledge and Data Engineering, 2009, 21 (5) : 609-623.
  • 6AGGARWAL C C, HAN J W, WANG J Y, etal. A framework for clustering evolving data streams[J]. Very Large Data Bases-VLDB, 2003, 29:81-92.
  • 7AGGARWAL C C, HAN J, WANG J, et al. ()n high dimensional projectedclustering of data streams[J]. Data Mining and Knowledge Discovery, 2005, 10 (3): 251-273.
  • 8CA() F, ESTER M, QIAN W, et al. Density based clustering over an evolving data stream with noise[C] //Proceedingsof the 2006 SIAM International Confer- ence on Data Mining. Eethesda, USA: SIAM, 2006: 328-339.
  • 9张建朋,陈福才,李邵梅,刘力雄.基于密度与近邻传播的数据流聚类算法[J].自动化学报,2014,40(2):277-288. 被引量:28
  • 10AGGARWAI. C C, YU P S. A framework for clus tering uncertain data streams[C]//IEEE 24th Inter- national Conference on Data Engineering. Cancun, Mexico: IEEE, 2008:150-159.

二级参考文献45

  • 1Cormode G, Garofalakis M. Sketching probabilistic data streams. In: Chan CY, Ooi BC, Zhou A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Beijing: ACM Press, 2007. 281-292.
  • 2Jayram TS, McGregor A, Muthukrishan, Vee E. Estimating statistical aggregates on probabilistic data streams. In: Libkin L, ed. Proc. of the 26th ACM SIGMOD-SIGACT-SIGART Symp. Principles of Database Systems. Beijing: ACM Press, 2007. 243-252.
  • 3Jayram TS, Kale S, Vee E. Efficient aggregation algorithms for probabilistic data. In: Bansal N, Pruhs K, Stein C, eds. Proc. of the 18th Annual ACM-SIAM Syrup. on Discrete Algorithms (SODA). New Orleans: SIAM, 2007. 346-355.
  • 4Aggarwal CC, Han J, Yu PS. A framework for clustering evolving data streams. In: Freytag JC, Lockmann PC, Abiteboul S, Carey MJ, Seling PG, Heuer A, eds. Proc. of the Int'l Conf. on Very Large Data Bases. Berlin: Morgan Kaufmann Publishers, 2003. 81-92.
  • 5Dalvi N, Suciu D. Efficient query evaluation on probabilistic databases. In: Nascimento MA, Ozsu MT, Kossmann D, Miller RJ, Blakeley JA, Schiefer KB, eds. Proe. of the VLDB. Toronto: Morgan Kaufmarm Publishers, 2004. 864-875.
  • 6Burdick D, Deshpande PM, Jayram TS, Ramakrishnan R, Vaithyanathan S. OLAP over uncertain and imprecise data. In: Bohm K, Jensen CS, Haas LM, Kersten ML, Larson P, Ooi BC, eds. Proc. of the Int'l Conf. on Very Large Data Bases. Trondheim: ACM Press, 2005.970-981.
  • 7Sarma AD, Benjelloum O, Halevy A, Widom J. Working models for uncertain data. In: Liu L, Reuter A, Whang KY, Zhang J, eds. Proc. of the 22nd Int'l Conf. on Data Engineering. Atlanta: IEEE Computer Society, 2006.
  • 8Cheng R, Kalashnikov D, Prabhakar S. Querying imprecise data in moving object environments. IEEE Trans. on Knowledge and Data Engineering, 2004,16(9):1112-1127.
  • 9Ngai WK, Kao B, Chui CK, Cheng R, Chau M, Yip KY. Efficient clustering of uncertain data. In: Cliton CW, Zhong M, Liu JM, Wah BW, Wu XD, eds. Proc. of the 6th IEEE Int'l Conf. on Data Mining. Hong Kong: IEEE Computer Society, 2006. 436-445.
  • 10Guha S, Mishra N, Motwani R, Callaghan LO. Clustering data streams. In: Yong DC, ed. Proe. of the 41st Annual Symp. on Foundations of Computer Science. Redondo Beach: IEEE Computer Society, 2000. 359-366.

共引文献66

同被引文献43

引证文献5

二级引证文献26

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部