一种面向连续型属性的特征选取方法

A method of feature selection for continuous attributes

导出

摘要特征选取是数据约简方法之一,其对提高机器学习的效率和效果具有重要影响。根据对象在特征空间中的分布,划分连续特征空间为类别单一、边界清晰的多个子空间。依统计学意义,把各个子空间分别投影到所有特征上,获取所有不同类别子空间对当前子空间特征区分能力的评估。通过构造区分能力评估矩阵,实现特征分类能力的排序。引入特征集区分能力信息增益,结合特征分类能力排序,逐一优选特征,最终完成特征子集的求解。采用UCI(University of California Irvine)数据集进行实验,获取特征子集,利用该特征子集,提高了机器学习效率和分类精度,表明了特征选取的可行性。 Feature selection is one of the methods for reduction of data sets,which improves efficiency and effectivity of machine learning.In terms of the distribution of objects and their classification labels,the continuous feature space was partitioned into a variety of subspaces,each one with a clear edge and unique classification label.After the projection of all the subspaces for each feature,the quality of each feature was estimated for a subspace opposite all the other subspaces with different classification labels by means of statistical significance.Through construction of a matrix by all the estimate qualities of all features of the subspaces,all features were ranked from the highest classifying power to the lowest on the matrix for the feature space.After the information gain function was defined by the subset of features,the feature subset was optimally determined on the basis of ranked features by gradually adding features.Experiments on the data sets from UCI（University of California Irvine） repository by the feature selection obtained feature subsets,by which the performance and classification accuracy of machine learning were improved,illustrating that the feature selection was feasible.

作者李国和岳翔李雪吴卫江李洪奇

机构地区中国石油大学(北京)地球物理与信息工程学院石大兆信数字身份管理与物联网技术研究院昆士兰大学信息技术与电气工程学院

出处《山东大学学报（工学版）》 CAS 北大核心 2011年第6期1-6,17,共7页 Journal of Shandong University（Engineering Science）

基金国家高新技术研究发展计划(863计划)资助项目(2009AA062802) 国家自然科学基金资助项目(60473125) 中国石油(CNPC)石油科技中青年创新基金资助项目(05E7013) 国家重大专项子课题资助项目(G5800-08-ZS-WX)

关键词数据约简特征选取连续型属性决策表 data reduction feature selection continuous attributes decision table

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献25

1KOHAVI R, JOHN G H. Wrappers for feature subset selection[ J]. Artificial Intelligence, 1997 (1-2) :273-324.
2LIU H, MOTODA H. Feature selection for knowledge discovery & data mining[ M ]. Boston : Kluwer Academic Publishers, 1998.
3KIRA K, RENDELL L A. A practical approach to feature selection[ C ]//Proceedings of International Conference on Machine Learning. Aberdeen: Morgan Kaufman, 1992 : 249-256.
4KIRA K, RENDELL L A. The feature selection problem: traditional methods and a new algorithm [C ]//Proceedings of the Tenth National Conference on Artificial Intelligence. Menlo Park: MIT Press, 1992: 129-134.
5KONONENKO I. Estimating attributes: analysis and extension of RELIEF[ C ]//Proceedings of the European Conference on Machine Learning. New York. Springer, 1994 : 171-182.
6Robnik-Sikonja M, Kononenko I. Theoretical and empirical analysis of Relief and RReliefF [J].Machine Learning, 2003 (53) :23-69.
7HUANG Y, MCCULLAGH P J, BLACK N D. Feature selection via supervised model construction [C ]//Proceedings of the Fourth IEEE International Conference on Data Mining. Washington: IEEE Computer Society, 2004: 411-414.
8LI Guangrong, HU Xiaohua. A novel unsupervised feature selection method for bioinformatics data sets through feature cluste- ring [ C ]//IEEE International Conference on Granular Computing. Hangzhou, China: IEEE Computer Society, 2008 : 41-47.
9GUO Gongde, NEAGU D, MARK T D, et al. Using kNN model for automatic feature selection[J]. Pattern Recognition and Data Mining, 2005 (36) :410-419.
10LIU Huan, SETIONO R. Feature selection via discretization[J].IEEE Transaction on Knowledge and Data Mining Engineer- ing, 1997, 9 (4) :642-645.

二级参考文献35

1张敏,马少平,宋睿华.DF还是IDF?主特征模型在Web信息检索中的使用[J].软件学报,2005,16(5):1012-1020. 被引量：13
2孙霞,郑庆华,王朝静,张素娟.一种基于生语料的领域词典生成方法[J].小型微型计算机系统,2005,26(6):1088-1092. 被引量：11
3李烨,尹汝泼,蔡云泽,许晓鸣.基于离散化的支持向量机特征选择[J].计算机工程,2006,32(11):16-17. 被引量：4
4陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量：96
5HAN JIAWEI, KAMBER M. Data mining: concepts and techniques[ M]. 2nd ed. Beijing: China Machine Press, 2006.
6BLUM A L, LANGLEY P. Selection of the relevant features andexamples in machine learning [J]. Artifical Intelligence, 1997, 97:245-271.
7KUDO M, SKLANSKY J. Comparison of algorithms that select features for pattern classifiers[ J]. Pattern Recognition, 2000, 33( 1 ) :25-41.
8VAPNIK V N. The nature of statistical learning theory [ M ]. New York: Springer Vedag, 2000.
9HETHCH S, BAY S D. The UCI KDD archive [ DB/ OL ]. [ 2009-04-08 ]. http ://kdd. ics. uci. edu.
10KING R D. Statlog databases [ DB/OL ]. [ 2009-08-09 ]. http ://www. 1 lace. up. pt./ML/statlog/datasetsmtml.

共引文献211

1顾军华,周艳聪,宋洁.基于遗传算法的最小约简算法研究[J].河北科技大学学报,2001,22(3):94-97. 被引量：1
2周金应.基于GA的汽车故障特征选择[J].山东交通科技,2009(6):11-14.
3赵云,刘惟一.基于遗传算法的特征选择方法[J].计算机工程与应用,2004,40(15):52-54. 被引量：15
4胡咏梅,贾磊.基于奇异粗集的中医辨证诊治知识支持系统[J].计算机工程与应用,2004,40(16):31-32. 被引量：2
5王芳,谢刚,谢克明.模糊规则挖掘的粗糙集约简算法[J].太原理工大学学报,2004,35(5):517-519.
6李云,叶春晓,李季,刘嘉敏,吴中福.基于特征关联性的特征选择算法研究[J].微型机与应用,2004,23(6):58-60. 被引量：6
7陈世联,郭昀.自反传递粗集中近似算子与拓扑算子的复合[J].昆明理工大学学报（理工版）,2004,29(5):133-135.
8邹谷山,蔡延光,罗世亮.基于粗集理论复杂系统神经网络模型构建[J].控制工程,2004,11(6):568-570. 被引量：4
9王名扬,卫金茂,伊卫国.变精度粗集模型在决策树生成过程中的应用[J].计算机工程与科学,2005,27(1):96-98. 被引量：4
10王练,李云,汪血焰.高维特征集选择模型研究[J].重庆邮电学院学报（自然科学版）,2005,17(1):113-116. 被引量：2

1沈海澜,王加阳,蒋外文,陈再良.模糊关联规则挖掘在电力负荷预测中的应用[J].计算机工程,2003,29(15):138-140. 被引量：5
2赵晖.基于邻域粗糙集与KNN的网络入侵检测[J].河南科学,2013,31(9):1404-1408. 被引量：3
3陈仕涛,陈国龙,郭文忠,刘延华.基于粒子群优化和邻域约简的入侵检测日志数据特征选择[J].计算机研究与发展,2010,47(7):1261-1267. 被引量：42
4张林,张建立.基于类别属性数学期望的分类算法[J].安庆师范学院学报（自然科学版）,2013,19(1):31-34. 被引量：1
5赵晖.融合邻域粗糙集与粒子群优化的网络入侵检测[J].计算机工程与应用,2013,49(18):73-77. 被引量：9
6郑美容.K-means聚类算法分析研究[J].信息与电脑（理论版）,2012(7):108-110. 被引量：2
7叶斌.基于粗糙集理论和聚类分析的全局离散化方法[J].机械与电子,2007,25(12):13-15. 被引量：1
8胡运禄,于津.基于FCM的连续属性模糊离散化方法[J].福建电脑,2013,29(3):118-121.
9杨涛,李龙澍.Rough集在软件测试用例选择中的应用[J].微机发展,2005,15(2):12-14. 被引量：2
10邓少波,黎敏,关素洁,万芳.一种非相容决策表的属性值与属性约简方法[J].计算机应用研究,2011,28(4):1308-1310. 被引量：3

山东大学学报（工学版）

2011年第6期

浏览历史

内容加载中请稍等...

一种面向连续型属性的特征选取方法

参考文献25

二级参考文献35

共引文献211

相关作者

相关机构

相关主题

浏览历史