基于随机投影的并行数据流聚类方法被引量：3

Random Projection Based Clustering Method of Parallel Data Streams

导出

摘要利用数据流的遗忘特性,应用随机投影,分层、动态地维护每个数据流的概要结构.基于该概要结构,快速计算数据流和聚类中心之间的近似距离,实现一种适合并行多数据流的K-means聚类方法.所进行的实验验证该方法的有效性. A synopsis is maintained dynamically for each data stream. The construction of the synopsis is based on random projections and it utilizes the amnesic feature of data stream. Using the synopsis, the approximate distances between streams and the cluster center can be computed fast. And an efficient online version of the classical K-means clustering algorithm is developed. The experimental results showy the method can be performed effectively with a good clustering quality.

作者陈华辉施伯乐

机构地区复旦大学计算机与信息技术系宁波大学信息科学与工程学院

出处《模式识别与人工智能》 EI CSCD 北大核心 2009年第1期113-122,共10页 Pattern Recognition and Artificial Intelligence

基金国家自然科学基金项目(No.60773072) 浙江省自然科学基金项目(No.Y104144) 浙江省教育厅项目(No.20051737)资助

关键词概要结构遗忘特性随机投影数据流 Synopsis, Amnesic Feature, Random Projection, Data Stream

分类号 TP301.6 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献26

1Keogh E, Kasetty S. On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. Data Mining and Knowledge Discovery, 2003, 7(4): 349-371
2Guha S, Meyerson A, Mishra N, et al. Clustering Data Streams: Theory and Practice. IEEE Trans on Knowledge and Data Engineering, 2003, 15(3) : 515 -528
3Aggarwal C C, Han Jiawei, Wang Jianyong, et al. A Framework for Clustering Evolving Data Streams //Proc of the 29th International Conference on Very Large Data Base. Berlin, Germany, 2003: 81 -92
4Charikar M, O'Callaghan L, Panigrahy R. Better Streaming Algorithms for Clustering Problems // Proc of the 35th Annual ACM Symposium on Theory of Computing. San Diego, USA, 2003 : 30 - 39
5Beringer J, Hullermeier E. Online Clustering of Parallel Data Streams. Data & Knowledge Engineering, 2006, 58(2): 180 - 204
6Yeh M Y, Dai Biru, Chen M S. Clustering over Multiple Evolving Streams by Events and Correlations. IEEE Trans on Knowledge and Data Engineering, 2007, 19(10) : 1349 - 1362
7Johnson W B, Lindenstrauss J. Extensions of Lipschitz Mappings into a Hilbert Space. Contemporary Mathematics, 1984, 26 ( 1 ) : 189 -206
8Achlioptas D. Database-Friendly Random Projections//Proc of the 20th ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. Santa Barbara, USA, 2001 : 274 -281
9Linial N, London E, Rabinovich Y. The Geometry of Graphs and Some of Its Algorithmic Applications. Combinatorica, 1995, 15 (2) : 215 -245
10Dasgupta S, Gupta A. An Elementary Proof of a Theorem of Johnson and Lindenstrauss. Random Structures & Algorithms, 2003, 22 (1): 60-65

同被引文献43

1周晓云,孙志挥,张柏礼,杨宜东.高维类别属性数据流离群点快速检测算法[J].软件学报,2007,18(4):933-942. 被引量：21
2刘旭,毛国君,孙岳,刘椿年.数据流中频繁闭项集的近似挖掘算法[J].电子学报,2007,35(5):900-905. 被引量：14
3罗会兰,孔繁胜,李一啸.聚类集成中的差异性度量研究[J].计算机学报,2007,30(8):1315-1324. 被引量：36
4Jain A K. Data Clustering: 50 Years Beyond K-Means[J]. Pattern Recognition Letters,2010,31(8):651-666.
5Fern X Z, Brodley C E. Random Projection For High Dimensional Data Clustering: A Cluster Ensemble Approach[C]//Proceedings of the 20th International Conference on Machine Learning. Washington DC, 2003: 186-193.
6Turk M,Pentland A P. Face Recognition Using Eigenfaces[C]//IEEE Conference on Computer Vision and Pattern Recognition. Maui Marriott, Hawaii, 1991 : 586-591.
7Deng Cai, et al. Orthogonal Laplacianfaces for Face Recognition [J ]. IEEE Transactions on Image Processing, 2006, 15 ( 11 ) : 3608-3614.
8Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000, 290 (5500) : 2323- 2326.
9Strehl A, Ghost J. Cluster Ensembles-A Knowledge Reuse Framework for Combining Multiple Partitions[J]. Journal of Machine Learning Research, 2002,3 : 583-617.
10Fred A L,Jain A K. Combining Multiple Clusterings Using Evidence Accumulation[J]. IEEE Transaction on Pattern Analysis and Machine Intelligence, 2005 : 835-850.

引证文献3

1周静波,殷俊,金忠.一种新的基于局部保持投影的高维数据聚类成员构造方法[J].计算机科学,2011,38(9):177-181.
2李小平,任恩恩.异构数据库相似语义属性聚类过程研究[J].铁道科学与工程学报,2012,9(2):119-124. 被引量：1
3曹红,郑鑫.数据流分类器算法在水质环境中的应用[J].科技通报,2014,30(1):117-122.

二级引证文献1

1朱新宁,冯辉.基于鱼群算法的异构数据库语义聚类的研究[J].计算机与数字工程,2013,41(1):12-13.

1陈华辉,施伯乐,钱江波,陈叶芳.基于小波概要的并行数据流聚类[J].软件学报,2010,21(4):644-658. 被引量：7
2陈华辉,施伯乐.数据流上具有数据遗忘特性的小波概要[J].计算机研究与发展,2009,46(2):268-279. 被引量：3
3杨颖,杨磊.分布式流数据频繁项发现算法的研究[J].计算机应用,2008,28(1):136-139. 被引量：1
4舒平达,陈华辉.支持多时间粒度的数据流上最频繁K项挖掘[J].宁波大学学报（理工版）,2009,22(4):500-505. 被引量：1
5陈春燕,吕俊龙,郭有强.基于时间衰减的分布式数据流聚类算法[J].太原师范学院学报（自然科学版）,2013,12(2):87-90. 被引量：1
6艾列富,刘奎,吴健.增强型残差量化的图像视觉特征不完全检索方法[J].合肥学院学报（自然科学版）,2016,26(1):46-51. 被引量：1
7冯文峰,郭巧,吴素妍.基于多层概要结构的数据流的频繁项集发现算法[J].北京理工大学学报,2006,26(6):512-516. 被引量：1
8NI发布LabVIEW8.5版本,助您自在享受多核时代的到来[J].电子与电脑,2007(9):59-59.
9NI发布LabVIEW8.5版本[J].汽车制造业,2007(16):6-6.
10曹振丽,孙瑞志,李勐.一种基于高斯混合模型的不确定数据流聚类方法[J].计算机研究与发展,2014,51(S2):102-109. 被引量：6

模式识别与人工智能

2009年第1期

浏览历史

内容加载中请稍等...

基于随机投影的并行数据流聚类方法被引量：3

参考文献26

同被引文献43

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于随机投影的并行数据流聚类方法 被引量：3

参考文献26

同被引文献43

引证文献3

二级引证文献1

相关作者

相关机构

相关主题

浏览历史

基于随机投影的并行数据流聚类方法被引量：3