摘要
与集成学习相比,针对单个分类器不能获得相对较高而稳定的准确率的问题,提出一种分类模型。该模型可集成多个随机森林,并以带阈值的多数投票法作为结合方法;模型实现主要分为建立集成分类模型、实例初步预测和结合分析三个层次。MapReduce编程方式实现的分类模型以P2P流量识别为例,分别与单个随机森林和集成其他算法进行对比,实验表明提出模型能获得更好的P2P流量识别综合分类性能,该模型也为二类型分类提供了一种可行的参考方法。
Compared to ensemble learning, this paper proposed a classification model to solve the problems of relatively low and unstable accuracy in a single classifier. This model integrated multiple random forests and used majority voting method with thresholds as combination method. The implementation of this model mainly consisted of three levels, that were building the integrated classification model, the preliminary prediction of instances and combination analysis. This classification model, which had a MapReduce programming mode implementation, took P2P traffic identification as an example. This paper compared the classification model respectively with single random forests and integration of other algorithms. Finally, the experiments show that the proposed model not only has better comprehensive performance in P2P traffic identification, but also provides a viable reference method for two-class classification.
出处
《计算机应用研究》
CSCD
北大核心
2015年第6期1621-1624,1629,共5页
Application Research of Computers
基金
国家科技重大专项子课题资助项目(2012ZX03005002-005)
重庆市应用开发计划资助项目(cstc2013yykf A40006)
2013年重庆高校创新团队建设计划资助项目(KJTD201312)