基于特征选择的数据流聚类

Data stream clustering based on feature selection

下载PDF

导出

摘要在数据流聚类时,冗余特征会影响数据的聚类质量,移除冗余特征以提高聚类质量就显得尤为重要。为解决此问题,提出一种基于特征选择的数据流聚类算法(DSCFC)。该算法应用了特征排序、特征等级评定、探测冗余不重要的特征、移除冗余特征算法等。实验结果表明,DSCFC算法能探测出数据流中隐含的冗余特征并移除冗余特征,在对有冗余特征的数据流聚类时,比CluSteam算法更有效,聚类质量更好。 Clustering in the data stream, the redundant features will affect the quality of data clustering, removing redundant features to improve the clustering quality is very important, To solve this problem, it is proposed that a data stream clustering algorithm based on feature selection （DSCFC）. It is one-pass clustering algorithms, these are applied that ranking feature, grading feature, detecting redundant features and removing the redundant features algorithm and so on. The experimental results indicated that DSCFC algorithm can detect hidden redundant features in data stream and remove redundant features; when there are redundant features in the data stream clustering, the algorithm is more efficient than CluStream, clustering quality is better.

作者龙鹏飞唐军王琳

机构地区长沙理工大学计算机与通信工程学院

出处《计算机工程与设计》 CSCD 北大核心 2010年第19期4235-4237,4241,共4页 Computer Engineering and Design

基金国家自然科学基金项目(10871031 60474070) 湖南省科技计划基金项目(2008FJ3015)

关键词数据流聚类特征选择冗余特征代价矩阵特征移除 clustering data streams feature selection redundant features cost matrix feature removal

分类号 TP391 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献10

1O'Callaghan L,Mishra N,Meyerson A,et al.Streaming-data algorithrus for high-qnality clustering[C].IEEE International Conference on Data Engineering,2002.
2Aggarwal J Han,Wang J,Yu P S.A framework for clustering evolving data streams[C].29th VLDB Conference,2003:81-92.
3Aggarwal C C,Han J,Wang J,et al.Framework for projected clustering of high dimensional data strums[C].30th VLDB Conference,2004:852-863.
4Cao F,Ester M,Qian W,et al.Density-based clustering over an evolving data stream with noise[C].SIAM Conference on Data Mining,2006.
5Dash M,Choi K,Scheuermann P,et al.Feature selection for clustering-A filter solution[C].IEEE International Conference on Data Mining,2002:115-124.
6宋清昆,郝敏.一种改进的模糊C均值聚类算法[J].哈尔滨理工大学学报,2007,12(4):8-10. 被引量：26
7Last M,Kandel A.Automated detection of outliers in real world data[C].2nd International Conference on Intelligent Technologies,2001:292-301.
8蒋盛益,郑琪,张倩生.基于聚类的特征选择方法[J].电子学报,2008,36(B12):157-160. 被引量：18
9朱蔚恒,印鉴,谢益煌.基于数据流的任意形状聚类算法[J].软件学报,2006,17(3):379-387. 被引量：51
10蒋盛益,李庆华,李新.数据流挖掘算法研究综述[J].计算机工程与设计,2005,26(5):1130-1132. 被引量：21

二级参考文献42

1王元珍,王健,李晨阳.一种改进的模糊聚类算法[J].华中科技大学学报（自然科学版）,2005,33(2):92-94. 被引量：18
2Lewis P M. The characteristic selection problem in recognition system[ J ]. IRE Transaction on Information Theory, 1962, 8 (2) : 171 - 178.
3Mark Last, Abraham Kandel, Oded Maimon. Information-theoretic algorithm for feature selection[ J]. Pattern Recognition Letters,2001,22(6) :799- 811.
4Kononenko I. Estimating attributes: analysis and extensions of RELIEF[ A] .Proc of ECML[ C]. Catania, Italy, Springer-Verlag New York, 1994. 171 - 182.
5Liu H, Moloch H. Feature Selection for Knowledge Discovery and Data Mining[M]. Klumwer, Boston. 1998.
6Hu Q H, Xie Z X, Yu D R. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [ J ].Pattern Recognition, 2007, 40(12) :3509 - 3521.
7Swiniarski R W, Skowron A. Rough set methods in feature selection and recognition[ J]. Pattern Recognition Letters,2003, 24(6) :833 - 849.
8Neurnann J, Schnorr C,Steidl O. Combined SVM-based feature selection and classification [ J ]. Machine Learning, 2005, 61 (1):129- 150.
9Huang J J,Cai Y Z, Xu X M.A hybrid genetic algorithm for feature selection wrapper based on mutual information[ J ]. Pattern Recognition Letters, 2007,28(13) : 1825 - 1844.
10Jiang S Y, Song X Y, et al. A clustering-based method for un- supervised intrusion detections[ J ]. Pattern Recognition Letters, 2006,27(7) :802 - 810.

共引文献111

1冯建英,石岩,王博,穆维松.基于聚类分析的数据挖掘技术及其农业应用研究进展[J].农业机械学报,2022,53(S01):201-212. 被引量：11
2忻凌,倪志伟,黄玲.基于数据流的BIRCH改进聚类算法[J].计算机工程与应用,2007,43(5):166-168. 被引量：6
3付长龙,吕彦波,姚全珠,杜旭辉.基于样本密度的SVM及其在入侵检测中的应用[J].计算机应用,2007,27(4):838-840. 被引量：1
4刘青宝,戴超凡,邓苏,张维明.基于网格的数据流聚类算法[J].计算机科学,2007,34(3):159-161. 被引量：10
5陈磊松.数据流处理系统的调度策略研究[J].计算机工程与设计,2007,28(8):1845-1847. 被引量：1
6王志坚,魏定国,吴时霖.基于Petri网统一模型的系统开发方法研究[J].系统仿真学报,2007,19(A01):175-178.
7邓维维,彭宏.一种新的演化文本流聚类算法[J].计算机科学,2007,34(9):125-127.
8于少伟,曹凯.基于云模型的动态交通数据流软划分算法[J].计算机工程与应用,2007,43(28):217-219. 被引量：5
9单世民,邓贵仕,何英昊.数据流中孤立点识别方法[J].计算机工程,2007,33(15):172-174. 被引量：4
10史金成,胡学钢.数据流挖掘研究[J].计算机技术与发展,2007,17(11):11-14. 被引量：6

1肖杰,黄汉永,张驹.一种基于频繁概念集的文本聚类方法[J].计算机系统应用,2009,18(5):81-84.
2董守玲,苏孟辉,林香鑫,李佳.IPv6校园网拓扑自动节点获取与发现算法[J].西安电子科技大学学报,2015,42(2):116-121.
3杨柳,李振宇,张大方,谢高岗.冗余最小化的IPv6拓扑发现方法[J].计算机研究与发展,2007,44(6):939-946. 被引量：13
4吕见霞,高仲合,吴静静.一种改进的DoubleTree网络拓扑探测方法[J].通信技术,2012,45(9):54-56.
5乔宏,张大方,曾彬,李明伟,韩健.基于改进DoubleTree算法的网络拓扑发现方法[J].计算机工程与科学,2010,32(4):18-21. 被引量：3
6赵慧珍,刘付显,李龙跃.Parzen窗确定系数的协同模糊C均值算法[J].重庆邮电大学学报（自然科学版）,2017,29(2):272-278. 被引量：4
7刘振山,王清贤,罗军勇.虚拟分布式IPv6路由器级拓扑探测模型[J].计算机科学,2008,35(8):46-47.

计算机工程与设计

2010年第19期

浏览历史

内容加载中请稍等...

基于特征选择的数据流聚类

参考文献10

二级参考文献42

共引文献111

相关作者

相关机构

相关主题

浏览历史