期刊文献+

一种面向连续型属性的特征选取方法

A method of feature selection for continuous attributes
原文传递
导出
摘要 特征选取是数据约简方法之一,其对提高机器学习的效率和效果具有重要影响。根据对象在特征空间中的分布,划分连续特征空间为类别单一、边界清晰的多个子空间。依统计学意义,把各个子空间分别投影到所有特征上,获取所有不同类别子空间对当前子空间特征区分能力的评估。通过构造区分能力评估矩阵,实现特征分类能力的排序。引入特征集区分能力信息增益,结合特征分类能力排序,逐一优选特征,最终完成特征子集的求解。采用UCI(University of California Irvine)数据集进行实验,获取特征子集,利用该特征子集,提高了机器学习效率和分类精度,表明了特征选取的可行性。 Feature selection is one of the methods for reduction of data sets,which improves efficiency and effectivity of machine learning.In terms of the distribution of objects and their classification labels,the continuous feature space was partitioned into a variety of subspaces,each one with a clear edge and unique classification label.After the projection of all the subspaces for each feature,the quality of each feature was estimated for a subspace opposite all the other subspaces with different classification labels by means of statistical significance.Through construction of a matrix by all the estimate qualities of all features of the subspaces,all features were ranked from the highest classifying power to the lowest on the matrix for the feature space.After the information gain function was defined by the subset of features,the feature subset was optimally determined on the basis of ranked features by gradually adding features.Experiments on the data sets from UCI(University of California Irvine) repository by the feature selection obtained feature subsets,by which the performance and classification accuracy of machine learning were improved,illustrating that the feature selection was feasible.
出处 《山东大学学报(工学版)》 CAS 北大核心 2011年第6期1-6,17,共7页 Journal of Shandong University(Engineering Science)
基金 国家高新技术研究发展计划(863计划)资助项目(2009AA062802) 国家自然科学基金资助项目(60473125) 中国石油(CNPC)石油科技中青年创新基金资助项目(05E7013) 国家重大专项子课题资助项目(G5800-08-ZS-WX)
关键词 数据约简 特征选取 连续型属性 决策表 data reduction feature selection continuous attributes decision table
  • 相关文献

参考文献25

  • 1KOHAVI R, JOHN G H. Wrappers for feature subset selection[ J]. Artificial Intelligence, 1997 (1-2) :273-324.
  • 2LIU H, MOTODA H. Feature selection for knowledge discovery & data mining[ M ]. Boston : Kluwer Academic Publishers, 1998.
  • 3KIRA K, RENDELL L A. A practical approach to feature selection[ C ]//Proceedings of International Conference on Machine Learning. Aberdeen: Morgan Kaufman, 1992 : 249-256.
  • 4KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm [C ]//Proceedings of the Tenth National Conference on Artificial Intelligence. Menlo Park: MIT Press, 1992: 129-134.
  • 5KONONENKO I. Estimating attributes: analysis and extension of RELIEF[ C ]//Proceedings of the European Conference on Machine Learning. New York. Springer, 1994 : 171-182.
  • 6Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of Relief and RReliefF [J].Machine Learning, 2003 (53) :23-69.
  • 7HUANG Y, MCCULLAGH P J, BLACK N D. Feature selection via supervised model construction [C ]//Proceedings of the Fourth IEEE International Conference on Data Mining. Washington: IEEE Computer Society, 2004: 411-414.
  • 8LI Guangrong, HU Xiaohua. A novel unsupervised feature selection method for bioinformatics data sets through feature cluste- ring [ C ]//IEEE International Conference on Granular Computing. Hangzhou, China: IEEE Computer Society, 2008 : 41-47.
  • 9GUO Gongde, NEAGU D, MARK T D, et al. Using kNN model for automatic feature selection[J]. Pattern Recognition and Data Mining, 2005 (36) :410-419.
  • 10LIU Huan, SETIONO R. Feature selection via discretization[J].IEEE Transaction on Knowledge and Data Mining Engineer- ing, 1997, 9 (4) :642-645.

二级参考文献35

共引文献211

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部