摘要
如今很多领域能采集到的特征变量数以万计,而能作为训练集的样本量却远小于特征数量,因此利用特征选择降低数据维数并提高算法的性能成了首要工作.特征选择的三类主流方法为过滤式、包裹式和嵌入式,但最近用演化计算(Evolutionary Computing,EC)技术进行特征选择获得了更多的关注,已有实验证明EC技术能取得更好的性能.提出一种基于预测算子的群搜索(Group Search Optimizer,GSO)特征选择算法(GSO feature selectionalgorithm based on predictive operators,PGSO)用于特征选择问题.首先在GSO算法基础上引入基于轮盘赌选择算法的变异算子,按变异概率来选择粒子某一维度的值进行变异,若变异后的粒子的适应度值更优,则保留该变异,以此来保持群体的多样性,提高算法的搜索性能.再者,对GSO算法加入预测算子,选取种群中5%的粒子学习生产者的历史最优位置,来预测下一生产者的位置,这样很大程度上加快了粒子的寻优速度.最后,在UCI中的六个数据集上将其与基于粒子群优化(Particle Swarm Optimization,PSO)算法、GSO算法以及竞争选择(Competitive Selection Optimization,CSO)算法的特征选择算法进行比较,实验结果验证了所提出的算法在单目标特征选择问题上有较低的错误率和快速收敛的性能,且它不易陷入局部最优.
Nowadays,we can collect tens of thousands of feature variables in many fields.However,the sample size which can be used as a training set is much smaller than the number of features.Therefore,it has become primary task to use feature selection to reduce the dimensions of the data and improve the performance of the algorithm.The three main methods of feature selection are filtering,parceling and embedding,whereas using Evolutionary Computing(EC)techniques to feature selection has gained more and more attention recently and obtained better performance.A novel GSO(Group Search Optimizer)feature selection algorithm based on predictive operators(PGSO)is proposed to solve the problem of the feature selection in this paper.Firstly,a mutation operator based onroulette wheel selection algorithm is introduced to this algorithm,where the value of one dimension for one particle is selected to change according to the mutation probability.If the fitness value of the mutated particles is better,the variability is retained to maintain the diversity of the population and improve the search performance of the algorithm.Secondly,the forecasting operator is added to the GSO algorithm,where 5% of the particles in the population are selected to learn the producer’s historical optimal position,so as to predict the position of the next producer.In this way it can greatly accelerate the speed of the particle optimization.Finally,the proposed algorithm is carried on six datasets of the UCI.The results show that for a single objective feature selection problem,PGSO has low error rate and faster convergence performance,and it isn’t easy to fall into local optimum compared to the feature selection algorithm based on Particle Swarm Optimization(PSO)algorithm,basic GSO algorithm and Competitive Selection Optimization(CSO)algorithm.
作者
陈海娟
冯翔
虞慧群
Chen Haijuan;Feng Xiang;Yu Huiqun(School of Information Science and Engineering,East China University of Science and Technology,Shanghai,200237,China;Smart City Collaborative Innovation Center,Shanghai Jiao Tong University,Shanghai,200240,China)
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2018年第6期1206-1215,共10页
Journal of Nanjing University(Natural Science)
基金
国家自然科学基金(61472139,61462073)
上海市经济和信息化委员会“信息化发展专项资金”(201602008)
关键词
特征选择
PGSO
轮盘赌选择
变异算子
预测算子
feature selection
PGSO
roulette wheel selection
mutation operator
forecasting operator