期刊文献+

基于谱聚类的多数据流演化事件挖掘 被引量:5

Mining Evolutionary Events from Multi-Streams Based on Spectral Clustering
下载PDF
导出
摘要 为解决从多数据流挖掘演化事件这一难题,提出了一种多数据流上的谱聚类算法SCAM(spectral clustering algorithm of multi-streams),其相似矩阵基于耦合度构造,而耦合度衡量了两个数据流的动态相似性.提出了算法EEMA(evolutionary events mining algorithm),该算法基于聚类模型的演变挖掘多数据流的演化事件.定义了聚类模型凝聚度,用以衡量聚类的紧凑程度,并证明了凝聚度的上界.基于到上界的距离和规范化相似矩阵的特征间隙,定义了聚类模型质量,并作为EEMA的优化目标自动地确定聚簇数k.设计了O-EEMA作为EEMA的优化实现,其时间复杂度为O(cn2/2).在合成和真实数据集上的实验结果表明,EEMA和O-EEMA是有效的、可行的. To solve the problem of mining evolutionary events from multi-streams, this paper proposes a spectral clustering algorithm, SCAM (spectral clustering algorithm of multi-streams), to generate the clustering models of Multi-Streams. The similarity matrix in the clustering models of Multi-Streams are based on Coupling Degree, which measures the dynamic similarity between two streams. In addition, this paper also proposes an algorithm, EEMA (evolutionary events mining algorithm), to discover the evolutionary event points based on the drift of clustering models. EEMA takes the index of Clustering Model Quality as the optimization objective in determing the number of clusters automatically. The Clustering Model Quality combines the matrix perturbation theory and the Clustering Cohesion, which has a sound upper bound and is used to measure the compactness of a clustering model. Finally, this paper presents O-EEMA (optimized-EEMA) as the optimization of EEMA with the temporal complexity of O(cn^2/2), and the results of extensive experiments on the synthetic and real data set show that EEMA and O-EEMA are effective and practicable.
出处 《软件学报》 EI CSCD 北大核心 2010年第10期2395-2409,共15页 Journal of Software
基金 国家自然科学基金No.600773169 国家"十一.五"科技支撑计划No.2006BAI05A01~~
关键词 多数据流 耦合聚类 演化事件 矩阵扰动 multi-streams spectral clustering evolutionary event matrix perturbation
  • 相关文献

参考文献2

二级参考文献18

  • 1Teng WG,Chen MS,Yu PS.A regression-based temporal pattern Ming scheme for data streams.In:Freytag JC,Lockemann PC,eds.Proc.of the 29th Int'l Conf.on Very Large Data Bases (VLDB 2003).Berlin:Morgan Kaufmann Publishers,2003.93-104.
  • 2Ben-David S,Gehrke J,Kifer D.Detecting change in data streams.In:Nascimento MA,Kossmann D,eds.Proc.of the 30th Int'l Conf.on Very Large Data Bases (VLDB 2004).Toronto:Morgan Kaufmann Publishers,2004.180-191.
  • 3Yu JX,Chong ZH,Lu HJ,Zhou AY.False positive or false negative:Mining frequent Itemsets from high speed transactional data streams.In:Nascimento MA,Kossmann D,eds.Proc.of the 30th Int'l Conf.on Very Large Data Bases (VLDB 2004).Toronto:Morgan Kaufmann Publishers,2004.204-215.
  • 4Datar M,Gionis A,Indyk P,Motwani R.Maintaining stream statistics over sliding windows.In:Eppstein D,ed.Proc.of the 13th Annual ACM-SIAM Symp.on Discrete Algorithms.San Francisco:ACM Press,2002.635-644.
  • 5Gehrke J,Korn F,Srivastava D.On computing correlated aggregates over continual data streams.Walid GA,ed.In:Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.New York:ACM Press,2001.13-24.
  • 6Rafiei D,Mendelzon A.Similarity-Based queries for time series data.In:Peckham J,ed.Proc.of the ACM SIGMOD Int'l Conf.on Management of Data.Tucson:ACM Press,1997.13-25.
  • 7Dula S,Kim C,Shim K.XWAVE:Optimal and approximate extended wavelets for streaming data.In:Nascimento MA,Kossmann D,eds.Proc.of the 30th Int'l Conf.on Very Large Data Bases (VLDB 2004).Toronto:Morgan Kaufmann Publishers,2004.288-299.
  • 8Gilbert AC,Kotidis Y,Muthukrishnan S,Strauss MJ.Surfing wavelets on streams:One-Pass summaries for approximate aggregate queries.In:Apers PMG,Atzeni P,eds.Proc.of the 27th Int'l Conf.on Very Large Data Bases (VLDB 2001).Roma:Morgan Kaufmann Publishers,2001.79-88.
  • 9Zhu YY,Shasha D.StatStream:Statistical monitoring of thousands of data streams in real time.In:Bressan S,Chaudhri AB,eds.Proc.of the 28th Int'l Conf.on Very Large Data Bases (VLDB 2002).Hong Kong:Springer-Verlag,2002.358-369.
  • 10Burrus CS,Gopinath RA,Guo HT.Introduction to Wavelets and Wavelet Transform.Englewood Cliffs:Prentice Hall,1998.1-145.

共引文献9

同被引文献68

  • 1张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 2唐伟,周志华.基于Bagging的选择性聚类集成[J].软件学报,2005,16(4):496-502. 被引量:94
  • 3汤秋菊,李义杰.无指导聚类在信用卡促销中的应用[J].计算机与现代化,2007(9):100-102. 被引量:1
  • 4NIE Feiping, ZENG Zinan, TSANG I W, et al. Spectral embedded clustering: a framework for in-sample and out- of-sample spectral clustering [ J ]. IEEE Transactions on Neural Networks, 2011, 22 ( 11 ) : 1796-1808.
  • 5SIDI O, KAICK O V, KLEINANAN Y, et al. Unsuper- vised co-segmentation of a set of shapes via descriptor- space spectral clustering [J ]. ACM Transacti-ons on Graphics, 2011, 30(6) :126-135.
  • 6XIANG Tao, GONG Shaogang. Spectral clustering with eigenvector selection [J]. Pattern Recognition, 2008, 1 : 1012-1029.
  • 7ZHAO Feng, JIAO Licheng, LIU Hanqiang, et al. Spec- tral clustering with eigenvector selection based on entropy ranking[J]. Neurocomputing, 2010, 73:1704-1717.
  • 8RELAGLIATI N, VERRI A. Spectral clustering with more than K eigenvectors[J]. Neurocomputing, 2011, 74 : 1391-1401.
  • 9LUXBURG U V. A tutorial on spectral clustering [ J ]. Statistics and Computing, 2007, 17(4) :395-416.
  • 10ZHANG Daoqiang, CHEN Songcan, ZHOU Zhihua. Constraint score: a new filter method for feature selec- tion with pairwise constraints [ J ]. Pattern Recognition, 2008, 41 : 1440-1451.

引证文献5

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部