期刊文献+

一种基于随机森林的改进特征筛选算法 被引量:15

An improved feature selection algorithm based on random forest
下载PDF
导出
摘要 肝癌是一种我国高发的消化系统恶性肿瘤,患者死亡率高,威胁极大。而其预后情况通常只能通过医生的专业知识和经验积累来粗略判断,准确率较差。因此文中在分析随机森林算法的基本原理的基础上,提出一种改进的基于随机森林的特征筛选算法,并应用Python编程设计了一个能够预处理数据、调用这些算法、控制各参数并展现测试结果的系统,最终将该系统应用于肝癌预后预测,比较分析了不同的算法、参数、内部策略对预测精度和计算性能的影响。研究结果表明,随机森林相比剪枝过的决策树具备更好的泛化能力和训练速度,改进的特征筛选算法能够在保证预测精度的前提下显著缩小特征集。 Liver cancer is a malignant tumor of the digestive system highly occurred in China,which causes high mortality of patients and great threat to their lives,and its prognosis conditions are often roughly judged by doctors with their professional knowledge and experience accumulation,which has poor accuracy. Therefore,on the basis of analyzing the basic principle of the random forest algorithm,an improved feature selection algorithm based on the random forest is proposed in this paper. The Python programming design is applied to design a system that can preprocess data,recall the algorithms,control various parameters and display test results. The system is applied to the prognosis prediction of the liver cancer. The influences of different algorithms,parameters and internal strategies on the prediction accuracy and computing performance are compared and analyzed. The research results show that in comparison with the pruned decision tree,the random forest has a better generalization ability and training speed,and the improved feature selection algorithm can significantly reduce the feature set on the premise of guaranteeing the prediction accuracy.
作者 刘云翔 陈斌 周子宜 LIU Yunxiang;CHEN Bin;ZHOU Ziyi(School of Computer Science and Information Engineering,Shanghai Institute of Technology,Shanghai 201418,China)
出处 《现代电子技术》 北大核心 2019年第12期117-121,共5页 Modern Electronics Technique
基金 国家自然科学基金项目(61702334) 上海市自然科学基金项目(17ZR1429700)~~
关键词 随机森林算法 特征筛选 肝癌预后预测 决策树 预测精度 特征集 random forest algorithm feature selection liver cancer prognosis prediction decision tree prediction accura cy feature set
  • 相关文献

参考文献2

二级参考文献38

  • 1Davies S, Russl S. NP completeness of searches for smallest possible feature sets[C]//Proceedings of the AAAI Fall Symposiums on Relevance, Menlo Park, 1994:37-39.
  • 2Breiman L. Random forests[J]. Machine Learning, 2001, 45(1): 5-32.
  • 3Strobl Carolin, Boulesteix Anne-Laure, Kneib Thomas, et al. Conditional variable importance for random forests[J]. BMC Bioinformatics, 2008, 9 (1) : 1-11.
  • 4Reif David M, Motsinger Alison A, McKinney Brett A, et al. Feature selection using a random forests classifier for the integrated analysis of multiple data types[C]//IEEE Symposium on Computational In- telligence and Bioinformatics and Computational Bi- ology, 2006: 171-178.
  • 5Mohammed Khalilia, Sounak Chakraborty, Mihail Popescu. Predicting disease risks from highly im- balanced data using random forese[J]. BMC Medi- cal Informaties and Decision Making, 2011, 11(7): 51-58.
  • 6Verikas A, Gelzinis A, Bacauskiene M. Mining data with random forests: a survey and results of new tests[J]. Pattern Recognition, 2011, 44 (2): 330-349.
  • 7Inza I, Larranaga P, Blanco R. Filter versus wrap- per gene selection approaches in DNA microarray domains [J]. Artificial Intelligence in Medicine, 2004, 31(2): 91-103.
  • 8李国杰.大数据研究的科学价值.中国计算机学会通讯,2012,8(9):8—15.
  • 9Gantz J,Reinsel D.Extracting value from chaos.IDC iview,2011,1142 9-10.
  • 10Madden S.From databases to big data.IEEE Internet Computing,2012,3:4-6.

共引文献296

同被引文献134

引证文献15

二级引证文献45

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部