摘要
移动互联网流量分类/聚类是有效管理网络流量的重要基础,但是已有文献采集的移动互联网流量数据来源不同、流量数据标签级别不同、描述流量数据的特征集合不同,所获得的实验结果无法进行直接比较。借助于MobileGT系统采集移动App产生的网络流量数据,从两种粒度标记流量数据(App级别和功能级别),以单向流和双向流分别获取不同的特征集合,进而综合性实验分析各种机器学习算法在不同标记粒度和不同特征集合描述的移动互联网流量数据上的分类/聚类性能。实验结果表明,在流统计特征方面,基于单向流的统计特征更优;在分类算法方面,随机森林和AdaBoost算法更优;在聚类算法方面,K-均值方法更优。
Mobile traffic classification/clustering is an important foundation for mobile network traffic management.However,the mobile network traffic data used by different papers were collected from different network environment.In addition,the labels and the flow statistical features of mobile traffic were different from papers.These experimental results couldn’t be directly compared.This paper collected the traffic data generated by App based on MobileGT system.The two kinds of labels were built on these data(App level and function level),and two kinds of flow statistical features were independently extracted on these traffic data.This paper comprehensively researched the machine learning techniques on the traffic data with different labels and different flow statistical features.The experimental results show that the uni-direction flow based features are better than bi-direction flow based features,random forest and AdaBoost are better on classifying mobile traffic data,and K-means is better on clustering mobile traffic data.
作者
黄燚
刘珍
王若愚
陈洁桐
Huang Yi;Liu Zhen;Wang Ruoyu;Chen Jietong(School of Medical Information Engineering,Guangdong Pharmaceutical University,Guangzhou 510006,China;Research Center of Information&Network Engineering,South China University of Technology,Guangzhou 510006,China;Communication&Computer Network Lab of Guangdong,Guangzhou 510006,China)
出处
《计算机应用研究》
CSCD
北大核心
2020年第11期3353-3358,共6页
Application Research of Computers
基金
国家自然科学基金资助项目(61501128)
广东省自然科学基金资助项目(2017A030313345)
国家级大学生创新创业训练计划资助项目(201710573005,S202010573042)
中央高校基本业务费资助项目(x2rj/D2174870)。
关键词
移动App流量
机器学习算法
分类方法
聚类方法
流式数据
mobile App traffic
machine learning algorithms
classification method
clustering method
data stream