Network traffic classification based on ensemble learning and co-training 被引量：5

Network traffic classification based on ensemble learning and co-training

导出

摘要 Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings： limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness. Classification of network traffic is the essential step for many network researches. However, with the rapid evolution of Internet applications the effectiveness of the port-based or payload-based identification approaches has been greatly diminished in recent years. And many researchers begin to turn their attentions to an alternative machine learning based method. This paper presents a novel machine learning-based classification model, which combines ensemble learning paradigm with co-training techniques. Compared to previous approaches, most of which only employed single classifier, multiple classifters and semi-supervised learning are applied in our method and it mainly helps to overcome three shortcomings： limited flow accuracy rate, weak adaptability and huge demand of labeled training set. In this paper, statistical characteristics of IP flows are extracted from the packet level traces to establish the feature set, then the classification model is crested and tested and the empirical results prove its feasibility and effectiveness.

作者 HE HaiTao LUO XiaoNan MA FeiTeng CHE ChunHui WANG JianMin

机构地区 School of Information Science and Technology Key Laboratory of Digital Life (Sun Yat-sen University) Information and Network Center

出处《Science in China(Series F)》 2009年第2期338-346,共9页 中国科学（F辑英文版）

基金 Supported by the National Natural Science Foundation of China (Grant Nos.60525213 and 60776096) the National Basic Research Program of China (Grant No.2006CB303106) the National High-Tech Research & Development Program of China (Grant Nos.2007AA01Z236 and 2007AA01Z449) the Joint Funds of NSFC-Guangdong (Grant No.U0735001) the National Project of Scientific and Technical Supporting Programs (Grant No.2007BAH13B01)

关键词 traffic classification ensemble learning CO-TRAINING network measurement traffic classification ensemble learning co-training network measurement

分类号 TP393 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献11

1Leo Breiman.Bagging predictors[J].Machine Learning.1996(2)
2Karagiannis T,Broido A,Faloutsos M.Transport layer iden-tication of P2P trac[].IMC’.2004
3Haner P,Sen S,Spatscheck O.ACAS:Automated construc-tion of application signatures[].SIGCOMM’.2005
4McGregor A,Hall M,Lorier P, et al.Flow clustering using machine learning techniques[].PAM.2004
5Zander A,Nguyen T,Armitage G.Automated tra?c classi-?cation and application identi?cation using machine learning[].LCN.2005
6Erman J,Mahanti A,Arlitt M.Identifying and discriminating between web and peer to peer tra?c in the network core[].WWW’.2007
7Bernaille L,Teixeira R,Akodkenou I.Trac classication on the fly[].ACM SIGCOMM Comput Commun Review.2004
8Park J,Tyan H -R,Kuo C -C J.Internet trac classication for scalable QoS provision[].IEEE International Con-ference on Multimedia and Expo.2006
9Bonglio D,Mellia M,Meo M, et al.Revealing Skype trac: when randomness plays with you[].SIGCOMM’.2007
10Blum A,Mitchell T.Combining labeled and unlabeled data with co-training[].The Eleventh Annual Conference on Computational Learning Theory.1998

同被引文献38

1谢代梁,王保良,黄志尧,李海青,金春晖,鲍立曾.主成分回归在中药过程软测量中的应用研究[J].仪器仪表学报,2004,25(z3):671-672. 被引量：2
2周云龙,孙斌,陆军.改进BP神经网络在气液两相流流型识别中的应用[J].化工学报,2005,56(1):110-115. 被引量：32
3蒋平,邢云燕,王芸,周忠宝.聚类回归分析在FMS加工质量分析中的应用[J].自动化技术与应用,2005,24(8):15-16. 被引量：4
4苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量：386
5李和平,胡占义,吴毅红,吴福朝.基于半监督学习的行为建模与异常检测[J].软件学报,2007,18(3):527-537. 被引量：30
6郑海清,林琛,牛军钰.一种基于紧密度的半监督文本分类方法[J].中文信息学报,2007,21(3):54-60. 被引量：11
7陈亮,龚俭,徐选.应用层协议识别算法综述[J].计算机科学,2007,34(7):73-75. 被引量：33
8ZHOU Zhihua,LI Ming.Semisupervised Regressionwith Cotraining-style Algorithms[J].IEEETransactions on Knowledge and Data Engineering,2007,19(11):1479-1493.
9KULIS B,BASU S,DHILLON I,et al.Semi-supervised Graph Clustering:A Kernel Approach[J].Machine Learning,2009,74(1):1-22.
10杨剑,王珏,钟宁.流形上的Laplacian半监督回归[J].计算机研究与发展,2007,44(7):1121-1127. 被引量：15

引证文献5

1梁吉业,高嘉伟,常瑜.半监督学习研究进展[J].山西大学学报（自然科学版）,2009,32(4):528-534. 被引量：32
2赵国锋,吉朝明,徐川.Internet流量识别技术研究[J].小型微型计算机系统,2010,31(8):1514-1520. 被引量：10
3张倩,李明,王雪松.基于密度分布的半监督回归算法研究[J].工矿自动化,2012,38(3):29-30.
4刘珍,王若愚,刘琼.基于Bootstrapping的因特网流量分类方法[J].北京邮电大学学报,2014,37(5):66-70. 被引量：3
5李东,黄道平,刘乙奇.基于协同训练的半监督异构自适应软测量建模方法的研究[J].化工学报,2020,71(5):2128-2138. 被引量：7

二级引证文献52

1麻瓯勃,刘雪娇,唐旭栋,周宇轩,胡亦承.基于半监督学习的恶意URL检测方法[J].计算机系统应用,2020(11):11-20. 被引量：4
2陈如清,于志恒.基于TentFWA-GD的RBF神经网络COD在线软测量方法[J].电子测量与仪器学报,2022,36(3):53-60. 被引量：5
3刘蓉.半监督学习的Co-training算法研究[J].电脑编程技巧与维护,2010(14):4-5. 被引量：1
4梁军,陈龙,周卫琪,陶文倩,姚明,胥正川.基于马尔科夫随机场和鲁棒误差函数的半监督分类研究[J].山东大学学报（理学版）,2010,45(11):1-4.
5姚林朋,王辉,钱勇,黄成军,郑文栋,江秀臣.基于半监督学习的XLPE电缆局部放电模式识别研究[J].电力系统保护与控制,2011,39(14):40-46. 被引量：19
6黄霜明,谢丽聪.协同训练半监督学习二次伪迭代算法[J].广西师范大学学报（自然科学版）,2011,29(3):110-114.
7徐兵,胡宁,方红琴.基于Netflow的网络流量监测系统研究[J].计算机测量与控制,2012,20(1):44-46. 被引量：4
8王晋,伍永豪,李聪.智能变电站网络流量监测系统研究[J].湖北电力,2012,36(5):25-27. 被引量：2
9陈康,向勇,喻超.大数据时代机器学习的新趋势[J].电信科学,2012,28(12):88-95. 被引量：37
10张立伟.IP网流量分析及应用[J].铁道通信信号,2013,49(1):62-65. 被引量：2

1李文斌,刘椿年,钟宁.基于两阶段集成学习的分类器集成[J].北京工业大学学报,2010,36(3):410-419. 被引量：4
2Fang Min.Novel ensemble learning based on multiple section distribution in distributed environment[J].Journal of Systems Engineering and Electronics,2008,19(2):377-380.
3WANG Ruo-yu,LIU Zhen,ZHANG Ling.Method of data cleaning for network traffic classification[J].The Journal of China Universities of Posts and Telecommunications,2014,21(3):35-45. 被引量：1
4XIANG Jian,WENG Jian-guang,ZHUANG Yue-ting,WU Fei.Ensemble learning HMM for motion recognition and retrieval by Isomap dimension reduction[J].Journal of Zhejiang University-Science A(Applied Physics & Engineering),2006,7(12):2063-2072. 被引量：1
5亓慧,王文剑,郭虎升.一种基于特征选择的SVM Bagging集成方法[J].小型微型计算机系统,2014,35(11):2533-2537. 被引量：9
6刘敏,谢伙生.一种基于旋转森林的集成协同训练算法[J].计算机工程与应用,2011,47(30):172-175. 被引量：10
7李斌,钟润添,王先基,庄镇泉.一种基于递增估计GMM的连续优化算法[J].计算机学报,2007,30(6):979-985. 被引量：9
8刘雪艳,张雪英,李凤莲,黄丽霞.Combining supervised classifiers with unlabeled data[J].Journal of Central South University,2016,23(5):1176-1182.
9李宏,卢小燕,刘玮文,Clement K.Kirui.Two-way Markov random walk transductive learning algorithm[J].Journal of Central South University,2014,21(3):970-977.

Science in China(Series F)

2009年第2期

浏览历史

内容加载中请稍等...

Network traffic classification based on ensemble learning and co-training 被引量：5

参考文献11

同被引文献38

引证文献5

二级引证文献52

相关作者

相关机构

相关主题

浏览历史