期刊文献+

分类问题的一种可伸缩特征选择算法 被引量:3

A Scalable Feature Selection Algorithm for Classification
下载PDF
导出
摘要 特征选择是数据挖掘分类中的一个重要问题.该文推导出一种新的衡量特征与类别相关度的测度SCD即描述特征取值序列类分布的CV系数,利用该测度给出一种线性的可伸缩特征选择算法StaFSOS,并证明了在类别数为2时,SCD测度满足分支界限法的单调性;给出了StaFSOS的一个完备形式———BBStaFS.在12个标准数据集中,StaFSOS算法得出的结果和目标集几乎一致,而StaFSOS的效率高于其它算法;而在另1个中,BBStaFS算法得出了准确结果.在用1000个样本20个特征的真实数据进行的测试中,StaFSOS运行时间是目前较快的GRSR的1/2,得出的特征集准确有效. Feature selection is an important issue in classification mining. This paper gives a dependence measure named SCD from statistical theory; this measure describes the CV ratio of class distributions of each feature value. According to SCD measure, an I/O linear feature selection algorithm (i.e. StaFSOS) is constructed. The SCD measure is proven to satisfy the monotonicity of Branch & Bound algorithm when there are only two classes, therefore StaFSOS and B&B are combined into BBStaFS feature selection algorithm. The result features selected by StaFSOS are consistent with the target features in 12 open benchmarks, but more efficiently than other algorithms, while BBStaFS selects the target features in another benchmark. When StaFSOS selects the target features by using a realworld data of 1000 samples and 20 features, GRSR is the most recent efficient algorithm, however, the runtime of StaFSOS is just half of GRSR.
出处 《计算机学报》 EI CSCD 北大核心 2005年第7期1223-1229,共7页 Chinese Journal of Computers
基金 国家"八六三"高技术研究发展计划项目基金(2004AA114030)资助.~~
关键词 数据挖掘 分类 特征选择 data mining classification feature selection
  • 相关文献

参考文献23

  • 1Dash M., Liu H.. Feature selection for classification. Intelligent Data Analysis, 1997, 1(3): 131~156
  • 2Siedklecki W., Sklansky J.. On automatic feature selection. International Journal of Pattern Recognition and Aritifical Intelligence, 1988, 2: 197~220
  • 3Sheinvald J., Dom B., Niblack W.. A modelling approach to feature selection. In: Proceedings of the 10th International Conference on Pattern Recognition, 1990, 1: 535~539
  • 4Almuallim H., Dietterich T.G.. Learning with many irrelevant features. In: Proceedings of the 9th National Conference on Artificial Intelligence, Cambridge, Massachusetts, 1992, 547~552
  • 5Narendra P.M., Fukunaga K.. A branch and bound algorithm for feature selection. IEEE Transactions on Computers, 1977, C-26 (9): 917~922
  • 6Kira K., Rendell L.A.. The feature selection problem: traditional methods and a new algorithm. In: Proceedings of the 9th National Conference on Artificial Intelligence, 1992, 129~134
  • 7Cardie C.. Using decision trees to improve case-based learning In: Proceedings of the 10th International Conference on Machine Learning, 1993, 25~32
  • 8Ucciardi A.N., Gose E.E.. A comparison of seven techniques for choosing subsets of pattern recognition. IEEE Transactions on Computers, 1971, C-20: 1023~1031
  • 9Liu H., Setiono R.. A probabilistic approach to feature selection: A filter solution. In: Proceedings of International Conference on Machine Learning, 1996, 319~327
  • 10Liu H., Setiono R.. Feature selection and classification: A probabilistic wrapper approach, In: Proceedings of the 9th International Conference on Industrial and Engineering Applications of AI and ES, 1996, 284~292

二级参考文献6

共引文献26

同被引文献35

  • 1刘勇洪,牛铮.基于MODIS遥感数据的宏观土地覆盖特征分类方法与精度分析研究[J].遥感技术与应用,2004,19(4):217-224. 被引量:58
  • 2魏维,赵学龙,刘凤玉,许满武.视频语义分类特征选择算法[J].系统仿真学报,2006,18(5):1143-1146. 被引量:5
  • 3王圆圆,李京.基于决策树的高光谱数据特征选择及其对分类结果的影响分析[J].遥感学报,2007,11(1):69-76. 被引量:22
  • 4Liu Huan,Yu Lei.Toward integrating feature selection algorithms for classification and clustering[J].IEEE Transactions on Knowledge and Data Engineering,2005,17(4):491-502.
  • 5Han Tao,Li Yaohui,Han Hui.Land cover classification in Western Loess Plateau of China with MODIS imagery[C]//Proceedings of 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005,1 : 202-205.
  • 6Liu Yong-hong,Xu Yongming,Shi Runhe,et al.Evaluation of various classifiers on regional land cover classification using MODIS data[C]//Proceedlngs of 2005 IEEE International Geoscience and Remote Sensing Symposium, 2005,2 : 1281-1283.
  • 7Friedl M A,Brodley C E.Decision tree classification of land .cover from remotely sensed data[J].Remote Sensing of Environment, 1998, 61 : 399-409.
  • 8Kenneth J,Nunohiro E,Ohshiro M,et al.Land cover classification from MODIS satellite data using probabilistically optimal ensemble of artificial neural networks[J].KES,2006(3):820-826.
  • 9Nakariyakul S,Casasent D P.Adaptive branch and bound algorithm for selecting optimal features[J].Pattern Recognition Letter, 2007,28( 12 ) : 1415-1427.
  • 10Robnik-Sikonja M,Kononenko I.Theoretical and empirical analysis of ReliefF and RReliefF[J].Machine Learning, 2003,53 (1-2) : 23-69.

引证文献3

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部