摘要
特征的相关和冗余,会直接影响随机森林随机抽取特征的质量,导致随机森林的收敛性减弱,降低随机森林模型的准确度、泛化能力及性能。基于此,提出一种融合近似马尔科夫毯的随机森林优化算法,该算法利用近似马尔科夫毯构建相似特征组,再从每个相似组中按比例抽取特征形成单棵决策树的特征子集,重复上述过程直至达到随机森林规模。该算法可以在保证随机森林特征的多样性前提下,利用近似马尔科夫毯消除特征间的相关性和冗余性,提高随机抽取特征的质量。通过在12组不同维度UCI数据集实验对比表明:融合近似马尔科夫毯的随机森林在一定程度上可以消除特征相关性和冗余性,提高模型的各项评价指标,泛化能力增强,更适用于高维数据。
The correlation and redundancy of features will directly affect the quality of randomly extracted features of random forests,leading to the weakened convergence of random forests and reducing the accuracy,generalization ability and performance of random forest models.Based on this,this paper proposes a random forest optimization algorithm incorporating approximate Markov blankets,which uses approximate Markov blankets to construct similar feature groups,then draws features from each similar group proportionally to form a feature subset of a single decision tree,and repeats the above process until it reaches the size of the random forest.The algorithm can improve the quality of randomly extracted features by eliminating the correlation and redundancy among features using approximate Markov blankets while ensuring the diversity of random forest features.The experimental comparison on 12 different dimensional UCI datasets shows that the random forest incorporating approximate Markov blanket can eliminate feature correlation and redundancy to a certain extent,improve various evaluation indexes of the model,enhance generalization ability,and be more suitable for high-dimensional data.
作者
罗计根
熊玲珠
杜建强
聂斌
熊旺平
李郅琴
LUO Jigen;XIONG Lingzhu;DU Jianqiang;NIE Bin;XIONG Wangping;LI Zhiqin(College of Computer Science,Jiangxi University of Chinese Medicine,Nanchang 330004,China;Information Office,Jiangxi Normal University,Nanchang 330022,China)
出处
《计算机工程与应用》
CSCD
北大核心
2023年第20期77-84,共8页
Computer Engineering and Applications
基金
国家重点研发计划(2019YFC1712301)
国家自然科学基金(62141202,61762051,82160955)
江西省自然科学基金面上项目(20202BAB202019)
江西省教育厅科学技术研究项目(GJJ190683,GJJ201232)
江西省卫生和计划生育委员会项目(2020B0409)
江西中医药大学校级科技创新团队发展计划(CXTD22015)。
关键词
随机森林
近似马尔科夫毯
特征选择
高维样本
random forest
approximate Markov blanket
feature selection
high dimensional samples