摘要
针对AdaBoost.M2算法在解决多类不平衡协议流量的分类问题时存在不足,提出一种适用于因特网协议流量多类不平衡分类的集成学习算法RBWS-ADAM2,本算法在AdaBoost.M2每次迭代过程中设计了基于权重的随机平衡重采样策略对训练数据进行预处理,该策略利用随机设置采样平衡点的重采样方式来更改多数类和少数类的样本数目占比,以构建多个具有差异性的训练集,并将样本权重作为样本筛选的依据,尽可能保留高权重样本,以加强对此类样本的学习。在国际公开的协议流量数据集上将RBWS-ADAM2算法与其他类似算法进行实验比较表明,相比于其他算法,该算法不仅对部分少数类的F-measure有较大提升,更有效提高了集成分类器的总体G-mean和总体平均F-measure,明显增强了集成分类器的整体性能。
The existing AdaBoost.M2 algorithm are insufficient in protocol traffic multiclass imbalance to solve the problem. So this paper proposed an ensemble algorithm called RBWS-ADAM2 for the classification of multiclass Internet traffic. During each iteration of AdaBoost.M2, this algorithm preprocessed the training dataset by randomly balanced resampling, this strategy changed the number of majorities and minorities by randomly setting the sampling balance point to build multiple different training datasets. Moreover, this strategy took sample weight as the basis for sample screening to strengthen the learning of this kind of sample. The experimental comparison of RBWS-ADAM2 algorithm and other similar algorithms on the internationally published protocol traffic datasets shows that, compared to other algorithms, the proposed RBWS-ADAM2 algorithm not only improves the F -measure of most minorities, but increases the overall G-mean and the overall average F -measure effectively, and obviously enhances the overall performance of the ensemble classifier.
作者
张仁斌
张杰
吴佩
Zhang Renbin;Zhang Jie;Wu Pei(School of Computer & Information,Hefei University of Technology,Hefei 230009,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第6期1863-1867,共5页
Application Research of Computers
关键词
流量分类
集成学习算法
多类不平衡
泛化性能
traffic classification
ensemble algorithm
multiclass imbalance
generalization performance