期刊文献+

基于随机复杂度约束的高维特征自动选择算法 被引量:1

An Automatic Feature Selection Algorithm for High Dimensional Data Based on the Stochastic Complexity Regularization
下载PDF
导出
摘要 高维特征选择问题是机器学习研究领域的公开问题,当前流行的1-范数约束正则化解决方案存在的主要问题是缺乏特征组选能力和特征选择能力受样本容量限制.本文从随机复杂度理论的模型冗余度最优下界推导得出了一种易于求解的基于零-范数约束的特征选择算法模型.该算法不仅可证优化,而且具备自动特征选择能力,克服了1-范数约束方法的主要缺点,算法不依赖于对数据真实生成模型的参数假设,具有广泛的适用性.仿真实验表明该算法在常规数据建模任务中的性能表现与1-范数约束方法相当,在真实基因数据集上的测试结果进一步验证了该算法在高维特征空间的性能优于近期发表的一些主要算法. Feature selection for high-dimensional sparse feature space is an open issue for machine learning research,prevalent 1-norm regularization approaches share some theoretical drawbacks,such as lack the ability to select out grouped features,and can not select more features than the sample size.This paper considers the sparse modeling problem from the stochastic complexity theory perspective,and derive an easy computable model from its Minimax bound approximation.The proposed approach is proved to be optimized,and can perform automatic feature selection similar to its 1-norm penalized alternatives,but overcome their drawbacks.Furthermore,it does not rely on any parametric assumptions about the true data-generating mechanism,which makes it broadly applicable.Various simulations performed with both synthetic and real biological data show that the proposed approach performs similarly to the popular 1-norm penalized counterparts in ordinary experimental setups,and outperforms the other methods in robustness and predictive accuracy for extremely sparse problems.
出处 《电子学报》 EI CAS CSCD 北大核心 2011年第2期370-374,共5页 Acta Electronica Sinica
基金 国家863高技术研究发展计划(No.2006AA01Z411) 四川省科技支撑计划(No.08ZC1543)
关键词 机器学习 生物信息学 特征选择 正则化方法 高维 machine learning bioinformatics feature selection regularization high dimensional
  • 相关文献

参考文献16

  • 1Guyon I, Elisseeff A. An introduction to variable and feature selection [ J ]. The Journal of Machine Learning Research, 2003,3(3) : 1157 - 1182.
  • 2王雪松,张依阳,程玉虎.基于高斯过程分类器的连续空间强化学习[J].电子学报,2009,37(6):1153-1158. 被引量:11
  • 3蒋盛益,郑琪,张倩生.基于聚类的特征选择方法[J].电子学报,2008,36(B12):157-160. 被引量:18
  • 4Hesterberg T, Choi N H,Meier L, Fraley C. Least angle and L1 penalized regression: A review [ J]. Stalistics Surveys, 2008, 2 (1):61 -93.
  • 5Zou H, Hastie T. Regularization and variable selection via the elastic net[ J]. Journal of the Royal Statistical Society Series B, 2005,67(2) :301 - 320.
  • 6Lin D,Pitler E,Foster D P,Ungar L H.In Defense of 10[R]. In: ICML/UAI/COLT Workhsop on Sparse Optimization and Variable Selection. Helsinki,2008.
  • 7Hastie T, Tibshirani R, Friedman J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction [M ]. New York: Slxinger-Verlag,2001.193 - 210.
  • 8Grunwald P D, Rissanen J. The Minimum Description Length Principle[ M]. The MIT Press, 2007.3 - 40.
  • 9Efron B, I-Iastie T, Johnstone I, Tibshirani R. Least angle re gression[ J]. The Annals of statistics,2004,32(2) :407 - 451.
  • 10Zou H. The adaptive lasso and its oracle properties[ J]. Journal of the American Statistical Associalion, 2006,101 (476) : 1418 - 1429.

二级参考文献22

  • 1秦斌,吴敏,王欣,阳春华.基于多智能体强化学习的焦炉集气管压力多级协调控制[J].电子学报,2006,34(10):1847-1851. 被引量:3
  • 2Lewis P M. The characteristic selection problem in recognition system[ J ]. IRE Transaction on Information Theory, 1962, 8 (2) : 171 - 178.
  • 3Mark Last, Abraham Kandel, Oded Maimon. Information-theoretic algorithm for feature selection[ J]. Pattern Recognition Letters,2001,22(6) :799- 811.
  • 4Kononenko I. Estimating attributes: analysis and extensions of RELIEF[ A] .Proc of ECML[ C]. Catania, Italy, Springer-Verlag New York, 1994. 171 - 182.
  • 5Liu H, Moloch H. Feature Selection for Knowledge Discovery and Data Mining[M]. Klumwer, Boston. 1998.
  • 6Hu Q H, Xie Z X, Yu D R. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation [ J ].Pattern Recognition, 2007, 40(12) :3509 - 3521.
  • 7Swiniarski R W, Skowron A. Rough set methods in feature selection and recognition[ J]. Pattern Recognition Letters,2003, 24(6) :833 - 849.
  • 8Neurnann J, Schnorr C,Steidl O. Combined SVM-based feature selection and classification [ J ]. Machine Learning, 2005, 61 (1):129- 150.
  • 9Huang J J,Cai Y Z, Xu X M.A hybrid genetic algorithm for feature selection wrapper based on mutual information[ J ]. Pattern Recognition Letters, 2007,28(13) : 1825 - 1844.
  • 10Jiang S Y, Song X Y, et al. A clustering-based method for un- supervised intrusion detections[ J ]. Pattern Recognition Letters, 2006,27(7) :802 - 810.

共引文献27

同被引文献28

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部