高维Turnstile型数据流聚类算法被引量：6

An Efficient Clustering Algorithm for High Dimensional Turnstile Data Streams

下载PDF

导出

摘要现有数据流聚类算法只能处理Ti me Series和Cash Register型数据流,并且应用于高维数据流时其精度不甚理想。提出针对高维Turnstile型数据流的子空间聚类算法HT-Stream,算法对数据空间进行网格划分,在线动态维护网格单元信息,采用倾斜时间窗口存储统计信息,根据用户指定时间跨度离线输出聚类结果。基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性。 Previous method only can deal with Time Series and Cash Register data stream. Moreover, the efficiency of clustering high dimensional data stream is not very satisfactory. In this paper a novel algorithm for clustering Turnstile data stream named HT-Stream is presented. HT-Stream partitions the space into grids, summarizes statistical information over data stream according to the tilted time window, and finds the clusters offline. HT-Stream can resolve high dimensional clustering problem and discover clusters with arbitrary shape. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.

作者周晓云张净孙志挥

机构地区东南大学计算机科学与工程系

出处《计算机科学》 CSCD 北大核心 2006年第11期14-17,37,共5页 Computer Science

基金国家自然科学基金(70371015) 教育部高等学校博士学科点科研基金(20040286009) 江苏省高校自然科学计划一般项目(05KJB520022)资助

关键词数据流子空间聚类高维倾斜时间窗口 Data stream,Subspace clustering, High dimension, Tilted time windows

分类号 TP391.41 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Babcock B, Babu S, Datar M, et al. Models and Issues in Data Stream Systems. In: Proceedings of the 21 st ACM Symposium on Principles of Database Systems, 2002. 1-16
2Muthukrishnan S. Data streams algorithms and applications. In:Proc of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 413-413
3金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量：161
4Keogh E, Lin J, Truppel W. Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. In: Proceedings of the IEE International Conference on Data Mining. IEEE Computer Society Press, 2003. 115-122
5Bradley P S, Fayyad U M. Refining Initial Points for K-Means Clustering. In.. Proceedings of 15th International Conference on Machine Learning. Morgan Kaufmann, 1998. 91-98
6Vlaehos M, Lin J ,Keogh E, et al. A Wavelet Based Anytime Algorithm for K-Means Clustering of Time Series. Workshop on Clustering High Dimensionality Data and Its Applications, at the 3 SIAM International Conference On Data Mining. San Francisco, CA, 2003
7Rodrigues P, Gama J, Pedroso J P. Hierarchical Time-Series Clustering for Data Streams
8Guha S, Mishra N, Motwani R, et al. Clustering Data Streams.In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science. Washington, DC:IEEE Computer Society,2000. 359-366
9O'Callaghan L, Mishra N,Meyerson A,et al. Motwani. Streaming-Data Algorithms for High-Quality Clustering. In: Proceedings of the 18th International Conference on Data Engineering.Washington, DC: IEEE Computer Society, 2002. 685-704
10Aggarwal C, Han J, Wang J,et al. A Framework for Clustering Evolving Data Streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. San Franeiseo: Morgan Kaufmann Publishers Inc,2003. 81492

二级参考文献52

1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1～16.
2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261～272.
4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97～106.
9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285～292.
10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910.

共引文献160

1田李,王乐,贾焰,邹鹏,李爱平.分布式数据流上低通信开销的连续极值查询方法研究[J].计算机研究与发展,2007,44(z3):61-66.
2陈飞波,钱卫宁,周傲英.基于最窄平行四边形的数据流突变检测算法[J].计算机研究与发展,2007,44(z3):505-510.
3何月梅,杜海艳,王保民.分形技术与矢量量化相结合的网络流量异常检测研究[J].邯郸学院学报,2009,19(3):73-76.
4秦林新,刘奇志.一种乱序数据流上的偏倚抽样算法[J].计算机研究与发展,2011,48(S3):298-303.
5张明明,芦琳.电能计量中的异常数据研究[J].电气应用,2013,0(S1):42-46. 被引量：2
6金澈清,崇志宏,周傲英.一种实时监控最近邻的近似算法[J].计算机科学与探索,2007,1(2):146-159.
7杨宜东,孙志挥,张净.基于核密度估计的分布数据流离群点检测[J].计算机研究与发展,2005,42(9):1498-1504. 被引量：8
8杜威,邹先霞.基于数据流的滑动窗口机制的研究[J].计算机工程与设计,2005,26(11):2922-2924. 被引量：11
9刘赏,黄亚楼,倪维健.流数据聚类模型变化检测策略[J].计算机工程与应用,2006,42(5):15-18.
10彭宏,刘洋,邓维维,郑启伦.股票数据流的相关性计算方法[J].华南理工大学学报（自然科学版）,2006,34(1):86-89. 被引量：9

同被引文献54

1王栩,李建中,王伟平.基于滑动窗口的数据流压缩技术及连续查询处理方法[J].计算机研究与发展,2004,41(10):1639-1644. 被引量：17
2谷峪,于戈,张天成.RFID复杂事件处理技术[J].计算机科学与探索,2007,1(3):255-267. 被引量：54
3张净,孙志挥.GDLOF:基于网格和稠密单元的快速局部离群点探测算法[J].东南大学学报（自然科学版）,2005,35(6):863-866. 被引量：6
4周明中,龚俭.数据流管理系统综述[J].计算机工程,2006,32(2):10-12. 被引量：9
5朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量：50
6周晓云,孙志挥,张柏礼,杨宜东.高维数据流子空间聚类发现及维护算法[J].计算机研究与发展,2006,43(5):834-840. 被引量：17
7周晓云,孙志挥,张柏礼,杨宜东.高维数据流聚类及其演化分析研究[J].计算机研究与发展,2006,43(11):2005-2011. 被引量：9
8常建龙,曹锋,周傲英+.基于滑动窗口的进化数据流聚类[J].软件学报,2007,18(4):905-918. 被引量：60
9刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量：10
10Babcock B,Babu S,Datar M,et al.Models and issues in data stream systems[C] //Proceedings of the 21st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems,2002:1-16.

引证文献6

1唐培霞,刘希玉.基于数据流的聚类分析算法研究[J].信息技术与信息化,2007(6):27-29. 被引量：1
2闫光辉,董晓慧,刘云,贺少领,马志程.自适应分形聚类进化甄别算法[J].计算机科学与探索,2010,4(7):662-672.
3蔡妮明,王翰虎,陈梅.一种基于滑动窗口的流数据聚类算法[J].计算机技术与发展,2011,21(1):23-26. 被引量：1
4张净,孙志挥,宋余庆,倪巍伟,晏燕华.基于信息论的高维海量数据离群点挖掘[J].计算机科学,2011,38(7):148-151. 被引量：10
5朱娟芳,霍欢,徐亚,奚金金,彭敦陆,高丽萍.一种基于滑动窗口的不确定数据流聚类算法[J].信息技术,2013,37(4):1-5. 被引量：2
6范振铎.基于数据流挖掘的油水井工况分析系统的设计与应用[J].山东工业技术,2019(3):78-78.

二级引证文献14

1陈永强,刘惠颖.一种基于密度的数据流聚类分析算法[J].科技创新导报,2009,6(22):20-20.
2杨茂林,卢炎生.基于剪枝的海量数据离群点挖掘[J].计算机科学,2012,39(10):152-156. 被引量：6
3王敬华,赵新想,张国燕,刘建银.NLOF:一种新的基于密度的局部离群点检测算法[J].计算机科学,2013,40(8):181-185. 被引量：28
4徐亚,霍欢,奚金金,袁怀旺.一种不确定数据流子空间聚类算法[J].信息技术,2014,38(2):27-30. 被引量：1
5李广霞,张思亮,崔哲.关联规则发现方法研究[J].软件导刊,2014,13(4):14-16. 被引量：1
6李广霞,崔哲.数据挖掘在事业单位绩效工资管理中的应用[J].石家庄职业技术学院学报,2014,26(4):11-13.
7林硕蕾.基于小数据冲突检测的坏点数据挖掘模型仿真[J].科技通报,2015,31(1):213-216. 被引量：2
8郭玲.可产生潜在威胁的网络数据挖掘模型仿真分析[J].科技通报,2015,31(3):216-219. 被引量：1
9文静云,古平,吴庭君.基于加权自然邻域属性和熵的离群检测算法[J].数字技术与应用,2015,33(2):136-139.
10周华平,陈顺生.基于动态可调衰减滑动窗口的变速数据流聚类算法[J].计算机应用与软件,2015,32(11):255-260. 被引量：2

1陈凤娟.面向数据流的频繁项集挖掘[J].洛阳师范学院学报,2015,34(2):82-85. 被引量：1
2唐懿芳,穆志纯,张师超,钟达夫.挖掘数据流频繁模式的相关技术和算法研究综述[J].计算机工程与应用,2009,45(26):121-125. 被引量：6
3周黔,吴铁军.一种基于倾斜时间窗口的时间序列偏向最近模式匹配算法[J].信息与控制,2007,36(6):678-683.
4崔文岩,孟相如,李纪真,王明鸣,陈天平,王坤.基于粗糙集粒子群支持向量机的特征选择方法[J].微电子学与计算机,2015,32(1):120-123. 被引量：9
5Li Li Feng Wang Jian-Feng Lu Da-Xing Zhang.A Novel Scheme for Compression of Cash Images[J].Journal of Electronic Science and Technology,2011,9(4):301-305.
6廖建平,马文龙.基于倾斜时间窗口的高效数据流偏向最近聚类分析算法[J].计算机与现代化,2010(5):24-29.
7庄波,刘希玉,隆坤.TWCT-Stream:数据流上的频繁模式挖掘算法[J].计算机工程与应用,2009,45(20):147-150. 被引量：1
8徐福培,李滨宇,戴建中,吴敏华,吴凡,张福炎.ReGIS图形命令的实现[J].计算机研究与发展,1991,28(5):30-38.
9修小林,高永强.STIL设计和测试原理[J].通信与电子测试,2001(3):25-27.
10姜宏,陈庶樵,扈红超,钱坤.基于GAIG特征选择算法的轻量化DDoS攻击检测方法[J].计算机应用研究,2016,33(2):502-506. 被引量：2

计算机科学

2006年第11期

浏览历史

内容加载中请稍等...

高维Turnstile型数据流聚类算法被引量：6

参考文献17

二级参考文献52

共引文献160

同被引文献54

引证文献6

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

高维Turnstile型数据流聚类算法 被引量：6

参考文献17

二级参考文献52

共引文献160

同被引文献54

引证文献6

二级引证文献14

相关作者

相关机构

相关主题

浏览历史

高维Turnstile型数据流聚类算法被引量：6