期刊文献+

基于两阶段集成学习的分类器集成 被引量:4

Combining Classifiers Based on Two-phase Ensemble Learning
下载PDF
导出
摘要 为了学习集成函数,提高分类性能,提出了两阶段集成学习方法(two-phases ensemble learning,简称为TPEL).结合垃圾邮件过滤一个2类文本分类问题,在4个公用数据集上对TPEL进行了一系列实验.实验结果表明,TPEL受集成的个体分类器个数的影响甚微;利用TPEL集成异构的多个分类器时效果显著;利用TPEL集成多个同构分类器时,绝大部分情况下取得了优于朴素贝叶斯等算法的结果,对稳定或不稳定学习器的集成效果都很好;TPEL的时间复杂度较低. In order to learn ensembled function and improve classification performance, a new algorithm framework named TPEL (Two-Phases Ensemble Learning) is proposed. For the task of email filtering, a typical problem of two-class categorization, we conduct a series of experiments on four public available datasets. The experimental results show that firstly the performance of TPEL is faintly affected by the count of the combined classifiers. Secondly, TPEL bears the best capacity when it combines multiple heterogeneous classifiers. Thirdly, in most of the experiments, the performance of TPEL is better than that of the comparing algorithms such as Na ive Bayes, Bagging, Boosting etc. In addition, TPEL reveals its promising results in the situation of that either the weak learner is steady or not. At last, TPEL is provided with reasonable time complexity.
出处 《北京工业大学学报》 EI CAS CSCD 北大核心 2010年第3期410-419,共10页 Journal of Beijing University of Technology
基金 国家自然科学基金资助项目(60673015) 河北省科技计划资助项目(09213515D) 河北省教育厅自然科学资助项目(200872) 石家庄经济学院博士科研启动基金
关键词 机器学习 数据挖掘 文本处理 分类 machine learning data mining text processing classification
  • 相关文献

参考文献22

二级参考文献81

  • 1马亮,陈群秀,蔡莲红.一种改进的自适应文本信息过滤模型[J].计算机研究与发展,2005,42(1):79-84. 被引量:18
  • 2王建会,王洪伟,申展,胡运发.一种实用高效的文本分类算法[J].计算机研究与发展,2005,42(1):85-93. 被引量:20
  • 3李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:95
  • 4Lewis D. D.. An evaluation of phrasal and clustered representalions on a text categorization task. In: Proceedings of SIGIR'92,the 15st ACM International Conference on Research and Development in Information Retrieval, Copenhagen, Denmark,1992, 37-50.
  • 5Sebastiani F,. Machine learning in automated text categorization. ACM Computing Surveys, 2002, 34(1): 1-47.
  • 6Lewis D.. Naive bayes at forty: The independence assumption in information retrieval. In: Proceedings of the 10th European Conference on Machine Learning, Chemnitz, Germany, 1998,4-15.
  • 7Salton G.. Automatic Text Processing: The Transformation,Analysis, and Retrieval of Information by Computer. Reading,MA: Addison Wesley, 1989.
  • 8Mitchell T. M.. Machine Learning. New York: McCraw Hill,1996.
  • 9Joachims T.. Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of the 10th European Conference on Machine Learning,Chemnitz, Germany, 1998, 137-142.
  • 10Yang Y. , Liu X.. A Re-examination of text categorization methods. In: Proceedings of SIGIR'99, the 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999, 42-49.

共引文献826

同被引文献40

引证文献4

二级引证文献20

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部