摘要
针对被动机器学习在P2P网络流识别中需要大量标记训练数据的问题,提出一种改进的主动学习机制,并将其与SVM分类模型相结合运用到P2P网络流识别。在采用锦标赛方法对未标记样本筛选过程中,引入样本差异性概念以避免标记样本同化而导致主动学习的早熟问题;在通过动态阈值调节因子加快主动学习收敛速度的同时,加入过拟合样本过滤策略以增强分类模型的泛化能力。理论分析和实验结果表明,该机制能有效提高未标记样本的利用率,避免主动学习可能产生的早熟收敛和过学习现象,提高P2P网络流识别精度。
In P2P network traffic identification, aiming to such the problems that passive machine learning needs a lot of labeled training data, proposes an active learning method and uses it in P2P network traffic identification. In the course of selecting samples using tournament method, introduces sample difference conception to avoid premature convergence because of similarity of labeled samples; based on regulatory factor of dynamic threshold speeding of active learning, adds mechanism for filtering over fitting samples to increase generalization capability of classifying model. Analysis and simulation show that the mechanism can effectively raise the utilization rate of unlabeled samples, avoid the phenomenon that premature convergence and overfitting probably appear to improve the accuracy rate of P2P network traffic identification.
出处
《现代计算机》
2014年第8期3-6,15,共5页
Modern Computer
关键词
P2P网络流识别
主动学习
动态阈值
过拟合样本
P2P Network Traffic Identification
Active Learning
Dynamic Threshold
Over Fitting Samples