摘要
从差异性出发,研究了基于特征集技术(通过一定的策略选取不同特征集以组成训练集)与数据技术(通过取样技术选取不同的训练集)的集成学习算法,分析了两种集成学习算法产生差异性的方法。针对决策树与神经网络模型,在标准数据集中对集成学习算法的性能进行实验研究,结果表明集成学习算法的性能依赖于数据集的特性以及产生差异性的方法等因素。从总体性能考虑,基于数据的集成学习算法在大多数数据集上优于基于特征集的集成学习算法。
From point of view of diversity, the paper studies ensemble learning algorithms based on feature sets and data. Methods of creating diversity for these ensemble learning algorithms are analyzed. And experimental studies for using decision trees and neural networks as basis models are conducted on 10 standard data sets. The results show that performances of ensemble learning algorithms depend on character of data sets, method of creating diversity, and etc. In general, performances of ensemble learning algorithms based on data are superior to one based on feature sets.
出处
《计算机工程》
CAS
CSCD
北大核心
2008年第6期35-37,共3页
Computer Engineering
基金
河北省教育厅基金资助项目“集成学习的差异性及其统一模型研究”(2006406)
关键词
差异性
集成学习
特征集
取样
性能
diversity
ensemble learning
feature set
sampling
performance