期刊文献+

高维Turnstile型数据流聚类算法 被引量:6

An Efficient Clustering Algorithm for High Dimensional Turnstile Data Streams
下载PDF
导出
摘要 现有数据流聚类算法只能处理Ti me Series和Cash Register型数据流,并且应用于高维数据流时其精度不甚理想。提出针对高维Turnstile型数据流的子空间聚类算法HT-Stream,算法对数据空间进行网格划分,在线动态维护网格单元信息,采用倾斜时间窗口存储统计信息,根据用户指定时间跨度离线输出聚类结果。基于真实数据集与仿真数据集的实验表明,算法具有良好的适用性和有效性。 Previous method only can deal with Time Series and Cash Register data stream. Moreover, the efficiency of clustering high dimensional data stream is not very satisfactory. In this paper a novel algorithm for clustering Turnstile data stream named HT-Stream is presented. HT-Stream partitions the space into grids, summarizes statistical information over data stream according to the tilted time window, and finds the clusters offline. HT-Stream can resolve high dimensional clustering problem and discover clusters with arbitrary shape. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.
出处 《计算机科学》 CSCD 北大核心 2006年第11期14-17,37,共5页 Computer Science
基金 国家自然科学基金(70371015) 教育部高等学校博士学科点科研基金(20040286009) 江苏省高校自然科学计划一般项目(05KJB520022)资助
关键词 数据流 子空间聚类 高维 倾斜时间窗口 Data stream,Subspace clustering, High dimension, Tilted time windows
  • 相关文献

参考文献17

  • 1Babcock B, Babu S, Datar M, et al. Models and Issues in Data Stream Systems. In: Proceedings of the 21 st ACM Symposium on Principles of Database Systems, 2002. 1-16
  • 2Muthukrishnan S. Data streams algorithms and applications. In:Proc of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia: Society for Industrial and Applied Mathematics, 2003. 413-413
  • 3金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 4Keogh E, Lin J, Truppel W. Clustering of Time Series Subsequences is Meaningless: Implications for Previous and Future Research. In: Proceedings of the IEE International Conference on Data Mining. IEEE Computer Society Press, 2003. 115-122
  • 5Bradley P S, Fayyad U M. Refining Initial Points for K-Means Clustering. In.. Proceedings of 15th International Conference on Machine Learning. Morgan Kaufmann, 1998. 91-98
  • 6Vlaehos M, Lin J ,Keogh E, et al. A Wavelet Based Anytime Algorithm for K-Means Clustering of Time Series. Workshop on Clustering High Dimensionality Data and Its Applications, at the 3 SIAM International Conference On Data Mining. San Francisco, CA, 2003
  • 7Rodrigues P, Gama J, Pedroso J P. Hierarchical Time-Series Clustering for Data Streams
  • 8Guha S, Mishra N, Motwani R, et al. Clustering Data Streams.In: Proceedings of the 41st Annual Symposium on Foundations of Computer Science. Washington, DC:IEEE Computer Society,2000. 359-366
  • 9O'Callaghan L, Mishra N,Meyerson A,et al. Motwani. Streaming-Data Algorithms for High-Quality Clustering. In: Proceedings of the 18th International Conference on Data Engineering.Washington, DC: IEEE Computer Society, 2002. 685-704
  • 10Aggarwal C, Han J, Wang J,et al. A Framework for Clustering Evolving Data Streams. In: Proceedings of the 29th International Conference on Very Large Data Bases. San Franeiseo: Morgan Kaufmann Publishers Inc,2003. 81492

二级参考文献52

  • 1Babcock B, Babu S, Datar M, Motwani R, Widom J. Models and issues in data streams. In: Popa L, ed. Proc. of the 21st ACM SIGACT-SIGMOD-SIGART Symp. on Principles of Database Systems. Madison: ACM Press, 2002. 1~16.
  • 2Terry D, Goldberg D, Nichols D, Oki B. Continuous queries over append-only databases. SIGMOD Record, 1992,21(2):321-330.
  • 3Avnur R, Hellerstein J. Eddies: Continuously adaptive query processing. In: Chen W, Naughton JF, Bernstein PA, eds. Proc. of the 2000 ACM SIGMOD Int'l Conf. on Management of Data. Dallas: ACM Press, 2000. 261~272.
  • 4Hellerstein J, Franklin M, Chandrasekaran S, Deshpande A, Hildrum K, Madden S, Raman V, Shah MA. Adaptive query processing: Technology in evolution. IEEE Data Engineering Bulletin, 2000,23(2):7-18.
  • 5Carney D, Cetinternel U, Cherniack M, Convey C, Lee S, Seidman G, Stonebraker M, Tatbul N, Zdonik S. Monitoring streams?A new class of DBMS applications. Technical Report, CS-02-01, Providence: Department of Computer Science, Brown University, 2002.
  • 6Guha S, Mishra N, Motwani R, O'Callaghan L. Clustering data streams. In: Blum A, ed. The 41st Annual Symp. on Foundations of Computer Science, FOCS 2000. Redondo Beach: IEEE Computer Society, 2000. 359-366.
  • 7Domingos P, Hulten G. Mining high-speed data streams. In: Ramakrishnan R, Stolfo S, Pregibon D, eds. Proc. of the 6th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. Boston: ACM Press, 2000. 71-80.
  • 8Domingos P, Hulten G, Spencer L. Mining time-changing data streams. In: Provost F, Srikant R, eds. Proc. of the 7th ACM SIGKDD Int'l Conf. on Knowledge Discovery and Data Mining. San Francisco: ACM Press, 2001. 97~106.
  • 9Zhou A, Cai Z, Wei L, Qian W. M-Kernel merging: Towards density estimation over data streams. In: Cha SK, Yoshikawa M, eds. The 8th Int'l Conf. on Database Systems for Advanced Applications (DASFAA 2003). Kyoto: IEEE Computer Society, 2003. 285~292.
  • 10Gibbons PB, Matias Y. Synopsis data structures for massive data sets. In: Tarjan RE, Warnow T, eds. Proc. of the 10th Annual ACM-SIAM Symp. on Discrete Algorithms. Baltimore: ACM/SIAM, 1999. 909-910.

共引文献160

同被引文献54

引证文献6

二级引证文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部