期刊文献+

Feature Selection Based on Difference and Similitude in Data Mining

Feature Selection Based on Difference and Similitude in Data Mining
下载PDF
导出
摘要 Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2). Feature selection is the pretreatment of data mining. Heuristic search algorithms are often used for this subject. Many heuristic search algorithms are based on discernibility matrices, which only consider the difference in information system. Because the similar characteristics are not revealed in discernibility matrix, the result may not be the simplest rules. Although differencesimilitude(DS) methods take both of the difference and the similitude into account, the existing search strategy will cause some important features to be ignored. An improved DS based algorithm is proposed to solve this problem in this paper. An attribute rank function, which considers both of the difference and similitude in feature selection, is defined in the improved algorithm. Experiments show that it is an effective algorithm, especially for large-scale databases. The time complexity of the algorithm is O(| C |^2|U |^2).
出处 《Wuhan University Journal of Natural Sciences》 CAS 2007年第3期467-470,共4页 武汉大学学报(自然科学英文版)
基金 Supported by the National Natural Science Foundation of China (90204008) Chen-Guang Plan of Wuhan City(20055003059-3)
关键词 knowledge reduction feature selection rough set difference set similitude set attribute rank function knowledge reduction feature selection rough set difference set similitude set attribute rank function
  • 相关文献

参考文献10

  • 1Nir Friedman,Dan Geiger,Moises Goldszmidt.Bayesian Network Classifiers[J].Machine Learning (-).1997(2-3)
  • 2Kohavi R,John G H.Wrappers for Feature Subset Selec-tion[].Artificial Intelligence.1997
  • 3Skowron A,Rauszer C.The Discernibility Matrices and Functions in Information Systems[]..1992
  • 4Hamilton H J,Shan Ning,Cercone N.RIAC: A Rule Induc-tion Algorithm Based on Approximate Classification[]..1996
  • 5Inza I,Merino M,Larranaga P,et al.Feature Subset Selection by Population-Based Incremental Learning[]..1999
  • 6Siedlecki W,Sklansky J.A Note on Genetic Algorithms for Large-Scale Feature-Selection[].Pattern Recognition.1989
  • 7Hu Keyun,Diao Lili,Lu Yuchang,et al.A Heuristic Optimal Reduct Algorithm[].Lecture Notes in Computer Science.2000
  • 8Weston J,Mukherjee S,Chapelle O,et al.Feature Selection for SVMs[].Neural Information Processing Systems.2000
  • 9Kuncheva L I.Fuzzy Rough Sets-Application to Fea-ture-Selection[].Fuzzy Sets and Systems.1992
  • 10Xia Delin,Yan Puliu.A New Method of Knowledge Reduction for Information System—DSM Approach[]..2001

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部