期刊文献+

基于深度学习与随机森林的高维数据特征选择 被引量:16

Feature selection for high dimensional data based on deep learning and random forest
下载PDF
导出
摘要 针对特征选择算法对高维大数据降维效果与稳定性差的缺点,提出一种基于深度学习与随机森林的大数据特征选择算法。设计基于随机森林的特征消除算法,对高维大数据集进行特征降维处理;采用保留的特征对受限玻尔兹曼机进行训练,确定受限玻尔兹曼机的模型结构与权重;使用训练受限玻尔兹曼机的学习参数初始化一个多层神经网络,通过标准的后向传播方法训练多层神经网络。基于多组数据集的实验结果表明,该算法提高了高维数据集特征选择的化简效果,保持了较高的稳定性与鲁棒性。 Aiming at the disadvantages of poor effects of dimensionality reduction and stability of feature selection algorithms for high dimensional big data,a feature selection algorithm for high dimensional data based on deep learning and random forest was proposed.A feature elimination algorithm based on the random forest was designed to realize the feature dimensionality reduction of high dimensional big data set.The preserved features were adopted to train the restricted Boltzmann machine,and the model structure and weights of restricted Boltzmann machine were decided.Learning parameters of the restricted Boltzmann machine training were used to initialize a multi-layer neural network,and the multi-layer neural network was trained through the standard back propagation method.Experimental results based on several data sets show that the proposed algorithm improves the effects of dimensionality reduction,and it also shows good stability and robustness.
作者 冯晓荣 瞿国庆 FENG Xiao-rong;QU Guo-qing(Engineering Training Center,Nantong University,Nantong 226019,China;Internet of Things Technology Research Institute,Jiangsu Vocational College of Business,Nantong 226011,China;Nantong Greatwisdom Information Technology Limited Company,Nantong 226009,China)
出处 《计算机工程与设计》 北大核心 2019年第9期2494-2501,共8页 Computer Engineering and Design
基金 江苏省第十四批“六大人才高峰”高层次人才培养基金项目(XYDXX-121)
关键词 特征选择 大数据 高维数据 深度学习 随机森林 受限玻尔兹曼机 feature selection big data high dimensional data deep learning random forest restricted Boltzmann machine
  • 相关文献

参考文献10

二级参考文献98

  • 1李颖新,李建更,阮晓钢.肿瘤基因表达谱分类特征基因选取问题及分析方法研究[J].计算机学报,2006,29(2):324-330. 被引量:45
  • 2毛勇,皮道映,刘育明,孙优贤.Accelerated Recursive Feature Elimination Based on Support Vector Machine for Key Variable Identification[J].Chinese Journal of Chemical Engineering,2006,14(1):65-72. 被引量:4
  • 3孙宏斌,谢开,蒋维勇,王皓,张伯明,罗建裕.智能机器调度员的原理和原型系统[J].电力系统自动化,2007,31(16):1-6. 被引量:32
  • 4边肇祺 张学工 等.模式识别[M].北京:清华大学出版社,2001..
  • 5Davies S, Russl S. NP completeness of searches for smallest possible feature sets[C]//Proceedings of the AAAI Fall Symposiums on Relevance, Menlo Park, 1994:37-39.
  • 6Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.
  • 7Strobl Carolin, Boulesteix Anne-Laure, Kneib Thomas, et al. Conditional variable importance for random forests[J]. BMC Bioinformatics, 2008, 9 (1) : 1-11.
  • 8Reif David M, Motsinger Alison A, McKinney Brett A, et al. Feature selection using a random forests classifier for the integrated analysis of multiple data types[C]//IEEE Symposium on Computational In- telligence and Bioinformatics and Computational Bi- ology, 2006: 171-178.
  • 9Mohammed Khalilia, Sounak Chakraborty, Mihail Popescu. Predicting disease risks from highly im- balanced data using random forese[J]. BMC Medi- cal Informaties and Decision Making, 2011, 11(7): 51-58.
  • 10Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests[J]. Pattern Recognition, 2011, 44 (2): 330-349.

共引文献366

同被引文献176

引证文献16

二级引证文献37

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部