摘要
In machine learning,randomness is a crucial factor in the success of ensemble learning,and it can be injected into tree-based ensembles by rotating the feature space.However,it is a common practice to rotate the feature space randomly.Thus,a large number of trees are required to ensure the performance of the ensemble model.This random rotation method is theoretically feasible,but it requires massive computing resources,potentially restricting its applications.A multimodal genetic algorithm based rotation forest(MGARF)algorithm is proposed in this paper to solve this problem.It is a tree-based ensemble learning algorithm for classification,taking advantage of the characteristic of trees to inject randomness by feature rotation.However,this algorithm attempts to select a subset of more diverse and accurate base learners using the multimodal optimization method.The classification accuracy of the proposed MGARF algorithm was evaluated by comparing it with the original random forest and random rotation ensemble methods on 23 UCI classification datasets.Experimental results show that the MGARF method outperforms the other methods,and the number of base learners in MGARF models is much fewer.
在机器学习中,随机性是集成学习成功的一个关键因素,可通过旋转特征空间将随机性注入到基于树的集成学习模型中。随机地旋转特征空间是一种常见的做法,因此,需要大量的树来保证集成模型的性能。该方法在理论上是可行的,但它需要大量的计算资源,因此,本文提出了基于多峰遗传算法的旋转森林(MGARF)算法。该方法是一种基于树的分类集成学习算法,利用树的特性通过特征旋转注入随机性,利用多峰优化方法选择一个更多样化、更准确的基学习机子集。通过在23个UCI分类数据集上与原始随机森林和随机旋转集成方法进行比较,评估了所提出的MGARF算法的分类精度。实验结果表明,MGARF方法的性能优于其他方法,而且MGARF模型中的基学习机的数量更少。
作者
XU Zhe
NI Wei-chen
JI Yue-hui
徐喆;倪维晨;吉月辉(School of Electrical and Electronic Engineering,Tianjin University of Technology,Tianjin 300384,China;Academic Affairs Office,Tianjin University of Technology,Tianjin 300384,China;Tianjin Key Laboratory for Control Theory&Applications in Complicated Industry Systems,Tianjin 300384,China)
基金
Project(61603274)supported by the National Natural Science Foundation of China
Project(2017KJ249)supported by the Research Project of Tianjin Municipal Education Commission,China。