摘要
在许多现实的机器学习任务中,经常遇到从一组变量中挑选一个子集的问题,即子集选择问题.对于这类问题的求解是NP难的.最近,一种基于多目标演化算法的子集选择算法POSS被提出;无论是在理论上还是在实验上,POSS方法均获得了目前的最佳性能.然而,当问题规模很大的时候,POSS方法的运行时间变得难以令人满意,这阻碍了其在大规模实际问题中的应用.提出了一种基于分解策略的多目标演化子集选择算法DPOSS.DPOSS方法将整个子集空间分解成多个子空间,并依次调用POSS方法来求解.在理论上,DPOSS方法在获得和POSS方法相同近似性能下界的同时,运行时间随着分解个数的增加超线性下降.实验结果验证了这一理论,并显示出,DPOSS方法的实际性能随着分解个数的增加略有下降,但依然优于以往的贪婪算法.
In many machine-learning tasks, subset selection, which selects a few variables from a large set, is a fundamental problem; it is, however, NP-hard. The recently emerged Pareto Optimization for Subset Selection(POSS) method is a powerful approximation solver for this problem. However, the POSS running time can be unsatisfactory when the problem size is large, restricting its large-scale applications. In this paper, we propose the DPOSS method, which uses a decomposition strategy. DPOSS decomposes the entire subset space into several subspaces, and then sequentially applies the POSS method. Our theoretical analysis shows that DPOSS can achieve the same approximation guarantee as POSS, while superlinearly reducing its running time with respect to the number of decompositions. Empirical studies show that DPOSS's actual running time decreases superlinearly,and the quality of the produced solution has a little loss. However, it is still better than the greedy algorithm,the previous algorithm with the best known theoretical guarantee.
出处
《中国科学:信息科学》
CSCD
北大核心
2016年第9期1276-1287,共12页
Scientia Sinica(Informationis)
基金
国家自然科学基金(批准号:61333014
61321491)资助项目
关键词
机器学习
子集选择
多目标优化
多目标演化算法
分解策略
machine learning
subset selection
multi-objective optimization
multi-objective evolutionary algorithm
decomposition