Feature Selection Based on Difference and Similitude in Data Mining

Feature Selection Based on Difference and Similitude in Data Mining

下载PDF

导出

摘要 Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude（DS） methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O（｜ C ｜^2｜U ｜^2）. Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude（DS） methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O（｜ C ｜^2｜U ｜^2）.

作者 WU Ming YAN Puliu

机构地区 School of Electronic Information

出处《Wuhan University Journal of Natural Sciences》 CAS 2007年第3期467-470,共4页 武汉大学学报（自然科学英文版）

基金 Supported by the National Natural Science Foundation of China (90204008) Chen-Guang Plan of Wuhan City(20055003059-3)

关键词 knowledge reduction feature selection rough set difference set similitude set attribute rank function knowledge reduction feature selection rough set difference set similitude set attribute rank function

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献10

1Nir Friedman,Dan Geiger,Moises Goldszmidt.Bayesian Network Classifiers[J].Machine Learning (-).1997(2-3)
2Kohavi R,John G H.Wrappers for Feature Subset Selec-tion[].Artificial Intelligence.1997
3Skowron A,Rauszer C.The Discernibility Matrices and Functions in Information Systems[]..1992
4Hamilton H J,Shan Ning,Cercone N.RIAC: A Rule Induc-tion Algorithm Based on Approximate Classification[]..1996
5Inza I,Merino M,Larranaga P,et al.Feature Subset Selection by Population-Based Incremental Learning[]..1999
6Siedlecki W,Sklansky J.A Note on Genetic Algorithms for Large-Scale Feature-Selection[].Pattern Recognition.1989
7Hu Keyun,Diao Lili,Lu Yuchang,et al.A Heuristic Optimal Reduct Algorithm[].Lecture Notes in Computer Science.2000
8Weston J,Mukherjee S,Chapelle O,et al.Feature Selection for SVMs[].Neural Information Processing Systems.2000
9Kuncheva L I.Fuzzy Rough Sets-Application to Fea-ture-Selection[].Fuzzy Sets and Systems.1992
10Xia Delin,Yan Puliu.A New Method of Knowledge Reduction for Information System—DSM Approach[]..2001

1江昊,晏蒲柳.基于DSM的知识约简方法研究[J].武汉大学学报（理学版）,2003,49(3):378-382. 被引量：1
2JiangHao YanPu-liu ChenXiao WuJing.Network Fault Diagnosis Using DSM[J].Wuhan University Journal of Natural Sciences,2004,9(1):63-67. 被引量：1
3YU Chenghai,MA Ning,WANG Kai,DU Juan,Van den Braembussche R.A.,LIN Feng.A Similitude Method and the Corresponding Blade Design of a Low-Speed Large-Scale Axial Compressor Rotor[J].Journal of Thermal Science,2014,23(2):145-152. 被引量：3
4PFISTER Michael,CHANSON Hubert.Two-phase air-water flows: Scale effects in physical modeling[J].Journal of Hydrodynamics,2014,26(2):291-298. 被引量：7

Wuhan University Journal of Natural Sciences

2007年第3期

浏览历史

内容加载中请稍等...

Feature Selection Based on Difference and Similitude in Data Mining

参考文献10

相关作者

相关机构

相关主题

浏览历史