基于随机复杂度约束的高维特征自动选择算法被引量：1

An Automatic Feature Selection Algorithm for High Dimensional Data Based on the Stochastic Complexity Regularization

下载PDF

导出

摘要高维特征选择问题是机器学习研究领域的公开问题,当前流行的1-范数约束正则化解决方案存在的主要问题是缺乏特征组选能力和特征选择能力受样本容量限制.本文从随机复杂度理论的模型冗余度最优下界推导得出了一种易于求解的基于零-范数约束的特征选择算法模型.该算法不仅可证优化,而且具备自动特征选择能力,克服了1-范数约束方法的主要缺点,算法不依赖于对数据真实生成模型的参数假设,具有广泛的适用性.仿真实验表明该算法在常规数据建模任务中的性能表现与1-范数约束方法相当,在真实基因数据集上的测试结果进一步验证了该算法在高维特征空间的性能优于近期发表的一些主要算法. Feature selection for high-dimensional sparse feature space is an open issue for machine learning research,prevalent 1-norm regularization approaches share some theoretical drawbacks,such as lack the ability to select out grouped features,and can not select more features than the sample size.This paper considers the sparse modeling problem from the stochastic complexity theory perspective,and derive an easy computable model from its Minimax bound approximation.The proposed approach is proved to be optimized,and can perform automatic feature selection similar to its 1-norm penalized alternatives,but overcome their drawbacks.Furthermore,it does not rely on any parametric assumptions about the true data-generating mechanism,which makes it broadly applicable.Various simulations performed with both synthetic and real biological data show that the proposed approach performs similarly to the popular 1-norm penalized counterparts in ordinary experimental setups,and outperforms the other methods in robustness and predictive accuracy for extremely sparse problems.

作者刘峤王娟陈伟秦志光

机构地区电子科技大学计算机科学与工程学院

出处《电子学报》 EI CAS CSCD 北大核心 2011年第2期370-374,共5页 Acta Electronica Sinica

基金国家863高技术研究发展计划(No.2006AA01Z411) 四川省科技支撑计划(No.08ZC1543)

关键词机器学习生物信息学特征选择正则化方法高维 machine learning bioinformatics feature selection regularization high dimensional

分类号 TP391.4 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献16

1Guyon I, Elisseeff A. An introduction to variable and feature selection [ J ]. The Journal of Machine Learning Research, 2003,3(3) : 1157 - 1182.
2王雪松,张依阳,程玉虎.基于高斯过程分类器的连续空间强化学习[J].电子学报,2009,37(6):1153-1158. 被引量：11
3蒋盛益,郑琪,张倩生.基于聚类的特征选择方法[J].电子学报,2008,36(B12):157-160. 被引量：18
4Hesterberg T, Choi N H,Meier L, Fraley C. Least angle and L1 penalized regression: A review [ J]. Stalistics Surveys, 2008, 2 (1):61 -93.
5Zou H, Hastie T. Regularization and variable selection via the elastic net[ J]. Journal of the Royal Statistical Society Series B, 2005,67(2) :301 - 320.
6Lin D,Pitler E,Foster D P,Ungar L H.In Defense of 10[R]. In: ICML/UAI/COLT Workhsop on Sparse Optimization and Variable Selection. Helsinki,2008.
7Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction [M ]. New York: Slxinger-Verlag,2001.193 - 210.
8Grunwald P D, Rissanen J. The Minimum Description Length Principle[ M]. The MIT Press, 2007.3 - 40.
9Efron B, I-Iastie T, Johnstone I, Tibshirani R. Least angle re gression[ J]. The Annals of statistics,2004,32(2) :407 - 451.
10Zou H. The adaptive lasso and its oracle properties[ J]. Journal of the American Statistical Associalion, 2006,101 (476) : 1418 - 1429.

二级参考文献22

1秦斌,吴敏,王欣,阳春华.基于多智能体强化学习的焦炉集气管压力多级协调控制[J].电子学报,2006,34(10):1847-1851. 被引量：3
2Lewis P M. The characteristic selection problem in recognition system[ J ]. IRE Transaction on Information Theory, 1962, 8 (2) : 171 - 178.
3Mark Last, Abraham Kandel, Oded Maimon. Information-theoretic algorithm for feature selection[ J]. Pattern Recognition Letters,2001,22(6) :799- 811.
4Kononenko I. Estimating attributes: analysis and extensions of RELIEF[ A] .Proc of ECML[ C]. Catania, Italy, Springer-Verlag New York, 1994. 171 - 182.
5Liu H, Moloch H. Feature Selection for Knowledge Discovery and Data Mining[M]. Klumwer, Boston. 1998.
6Hu Q H, Xie Z X, Yu D R. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [ J ].Pattern Recognition, 2007, 40(12) :3509 - 3521.
7Swiniarski R W, Skowron A. Rough set methods in feature selection and recognition[ J]. Pattern Recognition Letters,2003, 24(6) :833 - 849.
8Neurnann J, Schnorr C,Steidl O. Combined SVM-based feature selection and classification [ J ]. Machine Learning, 2005, 61 (1):129- 150.
9Huang J J,Cai Y Z, Xu X M.A hybrid genetic algorithm for feature selection wrapper based on mutual information[ J ]. Pattern Recognition Letters, 2007,28(13) : 1825 - 1844.
10Jiang S Y, Song X Y, et al. A clustering-based method for un- supervised intrusion detections[ J ]. Pattern Recognition Letters, 2006,27(7) :802 - 810.

共引文献27

1冯建英,石岩,王博,穆维松.基于聚类分析的数据挖掘技术及其农业应用研究进展[J].农业机械学报,2022,53(S01):201-212. 被引量：14
2王磊,刘艳.基于约束Laplacian分值的半监督特征选择算法[J].吉林大学学报（信息科学版）,2010,28(4):404-409. 被引量：4
3龙鹏飞,唐军,王琳.基于特征选择的数据流聚类[J].计算机工程与设计,2010,31(19):4235-4237.
4夏丽丽.连续状态-连续行动强化学习[J].电脑知识与技术,2011,7(7):4669-4672. 被引量：2
5王欣欣,赖惠成.改进的RBF文本分类算法[J].通信技术,2011,44(12):156-158. 被引量：2
6李文书,何芳芳,钱沄涛,周昌乐.基于Adaboost-高斯过程分类的人脸表情识别[J].浙江大学学报（工学版）,2012,46(1):79-83. 被引量：14
7周亚同,贾朋朋,刘龙.基于类间距判据的高斯过程分类模型核参数选择方法研究[J].微电子学与计算机,2012,29(8):6-8.
8伍之昂,庄毅,王有权,曹杰.基于特征选择的推荐系统托攻击检测算法[J].电子学报,2012,40(8):1687-1693. 被引量：23
9刘全,李瑾,傅启明,崔志明,伏玉琛.一种最大集合期望损失的多目标Sarsa(λ)算法[J].电子学报,2013,41(8):1469-1473. 被引量：3
10姚登举,杨静,詹晓娟.基于随机森林的特征选择算法[J].吉林大学学报（工学版）,2014,44(1):137-141. 被引量：255

同被引文献28

1王练,李云,汪血焰.高维特征集选择模型研究[J].重庆邮电学院学报（自然科学版）,2005,17(1):113-116. 被引量：2
2任江涛,黄焕宇,孙婧昊,印鉴.基于相关性分析及遗传算法的高维数据特征选择[J].计算机应用,2006,26(6):1403-1405. 被引量：17
3尚文倩,黄厚宽,刘玉玲,林永民,瞿有利,董红斌.文本分类中基于基尼指数的特征选择算法研究[J].计算机研究与发展,2006,43(10):1688-1694. 被引量：38
4毛勇,周晓波,夏铮,尹征,孙优贤.特征选择算法研究综述[J].模式识别与人工智能,2007,20(2):211-218. 被引量：95
5黄睿,何明一,杨少军.一种适用于小样本问题的基于边界的特征提取算法[J].计算机学报,2007,30(7):1173-1178. 被引量：6
6Fukunaga K.Introduction of Statistical Pattern Recognition[M].2nd ed.Waltham:Academic Press,1991.
7He X F,Niyogi P.Locality preserving projections[C]//Vancouver,Whistler,Eds.Advances in Neural Information Process-ing Systems.Cambridge:MIT Press,2003.
8Cai D,He X H,Han J W.Semi-supervised discriminant analysis[C]//Eleventh IEEE International Conference on Computer Vision.Brazil:Rio de Janeiro,2007.
9Liu H,Motoda H.Feature Selection for Knowledge Discovery and Data Mining[M].Boston:Kluwer,1998.
10Yu L,Liu H.Feature selection for high-dimensional data:a fast correlation-based filter solution[C]//Proceedings of the20th International Conferences on Machine Learning.Washington,DC,2003:856-863.

引证文献1

1杨杨,吕静.高维数据的特征选择研究[J].南京师范大学学报（工程技术版）,2012,12(1):57-63.

1谢格.一种快速的视频跟踪算法[J].计算机光盘软件与应用,2011(9):183-184. 被引量：1
2李国臣,张立凡,李茹,刘海静,石佼.基于词元语义特征的汉语框架排歧研究[J].中文信息学报,2013,27(4):44-51. 被引量：7
3李明则,向阳,张文华.一种自动化图像隐写分析平台[J].计算机系统应用,2013,22(9):54-59.
4陈相琳,刘润涛,于存光.基于DES与ECC的混合数据加密算法[J].哈尔滨理工大学学报,2007,12(1):58-61. 被引量：8
5吕欣,冯登国.背包问题的量子算法分析[J].北京航空航天大学学报,2004,30(11):1088-1091. 被引量：6
6郑娅峰,张巧荣,肖会敏.自动特征选择和加权的图像显著区域检测[J].计算机工程与应用,2011,47(24):154-156. 被引量：1
7张睿,李树刚.基于复杂网络的微吧话题流行度预测研究[J].科学技术与工程,2015,35(17):72-78. 被引量：1
8贾静平,张飞舟,柴艳妹,赵荣椿.特征自适应的精准目标跟踪算法[J].系统工程与电子技术,2008,30(11):2255-2258. 被引量：1
9许朝军.解密之后能保证数据真实吗？[J].中国信息界,2005(17):46-46.
10苗锦,刘志强,张跟鹏.基于互相关的时延估计方法及其精度分析[J].舰船电子工程,2008,28(6):98-100. 被引量：15

电子学报

2011年第2期

浏览历史

内容加载中请稍等...

基于随机复杂度约束的高维特征自动选择算法被引量：1

参考文献16

二级参考文献22

共引文献27

同被引文献28

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于随机复杂度约束的高维特征自动选择算法 被引量：1

参考文献16

二级参考文献22

共引文献27

同被引文献28

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于随机复杂度约束的高维特征自动选择算法被引量：1