摘要
为了学习集成函数,提高分类性能,提出了两阶段集成学习方法(two-phases ensemble learning,简称为TPEL).结合垃圾邮件过滤一个2类文本分类问题,在4个公用数据集上对TPEL进行了一系列实验.实验结果表明,TPEL受集成的个体分类器个数的影响甚微;利用TPEL集成异构的多个分类器时效果显著;利用TPEL集成多个同构分类器时,绝大部分情况下取得了优于朴素贝叶斯等算法的结果,对稳定或不稳定学习器的集成效果都很好;TPEL的时间复杂度较低.
In order to learn ensembled function and improve classification performance, a new algorithm framework named TPEL (Two-Phases Ensemble Learning) is proposed. For the task of email filtering, a typical problem of two-class categorization, we conduct a series of experiments on four public available datasets. The experimental results show that firstly the performance of TPEL is faintly affected by the count of the combined classifiers. Secondly, TPEL bears the best capacity when it combines multiple heterogeneous classifiers. Thirdly, in most of the experiments, the performance of TPEL is better than that of the comparing algorithms such as Na ive Bayes, Bagging, Boosting etc. In addition, TPEL reveals its promising results in the situation of that either the weak learner is steady or not. At last, TPEL is provided with reasonable time complexity.
出处
《北京工业大学学报》
EI
CAS
CSCD
北大核心
2010年第3期410-419,共10页
Journal of Beijing University of Technology
基金
国家自然科学基金资助项目(60673015)
河北省科技计划资助项目(09213515D)
河北省教育厅自然科学资助项目(200872)
石家庄经济学院博士科研启动基金
关键词
机器学习
数据挖掘
文本处理
分类
machine learning
data mining
text processing
classification