摘要
针对网络流量协议标注比较困难的问题,提出一种基于贝叶斯网络的半监督学习模型,以提高Inter-net协议的识别精度.该模型首先使用少量的标注样本训练贝叶斯网络分类模型,并对未标注样本进行初始分类,然后从未标注样本中挑选分类损失最小的样本加入到训练集中并重复训练分类模型,经过多次循环训练出最终的分类器.该模型可以使用未标注样本和标注样本共同训练分类模型,非常适合于标注比较困难的Internet应用协议的识别.实验结果表明:在标注样本较少的情况下,该模型的识别精度和稳定性均优于朴素贝叶斯模型和贝叶斯网络模型,对于提高Internet协议的识别精度是有效的.
As it is difficult to label the protocol of Internet traffic,a semi-supervised learning model based on Bayesian network was proposed to improve the accuracy of Internet protocol identification.First of all,a few labeled samples were used to train classification model of Bayesian network and the model was used to classify unlabeled samples,and then the unlabeled sample which has the lowest classification loss was selected to join the training set and retrain the classification model.After several cycles,the final classifier was trained to complete.It is an important advantage of the model that it can be trained by labeled samples and unlabeled samples,which is fit for the identification of Internet application protocol,because it is difficult to label the Internet traffic.The results of experiment show that the accuracy and stability of the model are better than Naive Bayes and Bayesian network,and it is an effective way to improve the accuracy of Internet protocol identification.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2012年第9期44-47,71,共5页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
国家高技术研究发展计划资助项目(2009AA01Z424)
陕西省教育厅专项基金资助项目(12JK0933)
西北工业大学基础研究基金资助项目(JC201149)
关键词
贝叶斯网络
互联网
半监督学习
损失函数
流量识别
Bayesian networks
Internet
semi-supervised learning
loss function
traffic identification