期刊文献+

基于特征选择的数据流聚类

Data stream clustering based on feature selection
下载PDF
导出
摘要 在数据流聚类时,冗余特征会影响数据的聚类质量,移除冗余特征以提高聚类质量就显得尤为重要。为解决此问题,提出一种基于特征选择的数据流聚类算法(DSCFC)。该算法应用了特征排序、特征等级评定、探测冗余不重要的特征、移除冗余特征算法等。实验结果表明,DSCFC算法能探测出数据流中隐含的冗余特征并移除冗余特征,在对有冗余特征的数据流聚类时,比CluSteam算法更有效,聚类质量更好。 Clustering in the data stream, the redundant features will affect the quality of data clustering, removing redundant features to improve the clustering quality is very important, To solve this problem, it is proposed that a data stream clustering algorithm based on feature selection (DSCFC). It is one-pass clustering algorithms, these are applied that ranking feature, grading feature, detecting redundant features and removing the redundant features algorithm and so on. The experimental results indicated that DSCFC algorithm can detect hidden redundant features in data stream and remove redundant features; when there are redundant features in the data stream clustering, the algorithm is more efficient than CluStream, clustering quality is better.
出处 《计算机工程与设计》 CSCD 北大核心 2010年第19期4235-4237,4241,共4页 Computer Engineering and Design
基金 国家自然科学基金项目(10871031 60474070) 湖南省科技计划基金项目(2008FJ3015)
关键词 数据流聚类 特征选择 冗余特征 代价矩阵 特征移除 clustering data streams feature selection redundant features cost matrix feature removal
  • 相关文献

参考文献10

  • 1O'Callaghan L,Mishra N,Meyerson A,et al.Streaming-data algorithrus for high-qnality clustering[C].IEEE International Conference on Data Engineering,2002.
  • 2Aggarwal J Han,Wang J,Yu P S.A framework for clustering evolving data streams[C].29th VLDB Conference,2003:81-92.
  • 3Aggarwal C C,Han J,Wang J,et al.Framework for projected clustering of high dimensional data strums[C].30th VLDB Conference,2004:852-863.
  • 4Cao F,Ester M,Qian W,et al.Density-based clustering over an evolving data stream with noise[C].SIAM Conference on Data Mining,2006.
  • 5Dash M,Choi K,Scheuermann P,et al.Feature selection for clustering-A filter solution[C].IEEE International Conference on Data Mining,2002:115-124.
  • 6宋清昆,郝敏.一种改进的模糊C均值聚类算法[J].哈尔滨理工大学学报,2007,12(4):8-10. 被引量:26
  • 7Last M,Kandel A.Automated detection of outliers in real world data[C].2nd International Conference on Intelligent Technologies,2001:292-301.
  • 8蒋盛益,郑琪,张倩生.基于聚类的特征选择方法[J].电子学报,2008,36(B12):157-160. 被引量:18
  • 9朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量:51
  • 10蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量:21

二级参考文献42

  • 1王元珍,王健,李晨阳.一种改进的模糊聚类算法[J].华中科技大学学报(自然科学版),2005,33(2):92-94. 被引量:18
  • 2Lewis P M. The characteristic selection problem in recognition system[ J ]. IRE Transaction on Information Theory, 1962, 8 (2) : 171 - 178.
  • 3Mark Last, Abraham Kandel, Oded Maimon. Information-theoretic algorithm for feature selection[ J]. Pattern Recognition Letters,2001,22(6) :799- 811.
  • 4Kononenko I. Estimating attributes: analysis and extensions of RELIEF[ A] .Proc of ECML[ C]. Catania, Italy, Springer-Verlag New York, 1994. 171 - 182.
  • 5Liu H, Moloch H. Feature Selection for Knowledge Discovery and Data Mining[M]. Klumwer, Boston. 1998.
  • 6Hu Q H, Xie Z X, Yu D R. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [ J ].Pattern Recognition, 2007, 40(12) :3509 - 3521.
  • 7Swiniarski R W, Skowron A. Rough set methods in feature selection and recognition[ J]. Pattern Recognition Letters,2003, 24(6) :833 - 849.
  • 8Neurnann J, Schnorr C,Steidl O. Combined SVM-based feature selection and classification [ J ]. Machine Learning, 2005, 61 (1):129- 150.
  • 9Huang J J,Cai Y Z, Xu X M.A hybrid genetic algorithm for feature selection wrapper based on mutual information[ J ]. Pattern Recognition Letters, 2007,28(13) : 1825 - 1844.
  • 10Jiang S Y, Song X Y, et al. A clustering-based method for un- supervised intrusion detections[ J ]. Pattern Recognition Letters, 2006,27(7) :802 - 810.

共引文献111

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部