摘要
选择性集成是当前机器学习领域的研究热点之一。由于选择性集成属于NP"难"问题,人们多利用启发式方法将选择性集成转化为其他问题来求得近似最优解,因为各种算法的出发点和描述角度各不相同,现有的大量选择性集成算法显得繁杂而没有规律。为便于研究人员迅速了解和应用本领域的最新进展,本文根据选择过程中核心策略的特征将选择性集成算法分为四类,即迭代优化法、排名法、分簇法、模式挖掘法;然后利用UCI数据库的20个常用数据集,从预测性能、选择时间、结果集成分类器大小三个方面对这些典型算法进行了实验比较;最后总结了各类方法的优缺点,并展望了选择性集成的未来研究重点。
Ensemble pruning is an active research direction in the machine learning field.Ensemble pruning is an NP-hard problem,most researchers use heuristics to obtain near optimal solutions.There are already many ensemble pruning approaches in literatures,but because of the different perspectives on which those methods are based,it is difficult to understand them clearly.In this paper,the ensemble pruning approaches are divided into four categories according to their pruning strategies:optimization-based,ranking-based,clustering based and pattern mining-based.Next,the popular algorithms of each category are implemented and tested on 20 datasets from the UCI repository,and compared from three facets:prediction performance,pruning time and the size of the final ensembles.The advantages and disadvantages of each category are analyzed.The paper ends with some conclusions and future work.
出处
《计算机工程与科学》
CSCD
北大核心
2012年第2期134-138,共5页
Computer Engineering & Science
基金
国家自然科学基金资助项目(60905032
60773017)
关键词
集成学习
选择性集成
排名法
分簇法
迭代优化法
模式挖掘法
ensemble learning
ensemble pruning
optimization based pruning
ranking based pruning
clustering based pruning
pattern mining based pruning