期刊文献+

基于密度聚类和随机森林的移动应用识别技术 被引量:6

Technology of Mobile Application Identification Based on Density-Based Clustering and Random Forest
下载PDF
导出
摘要 随着移动终端设备的蓬勃发展,移动应用种类的日益增加,移动应用类型识别成为网络管理、市场营销以及网络攻击防范等领域中一种具有重要意义的技术手段。在实际应用中,几乎所有的移动应用程序都采用SSL/TLS(Secure Sockets Layer/Transport Layer Security)协议进行数据加密,因此使移动应用类型识别工作更具挑战。提出了一种新颖的加密环境下Android移动应用类型识别技术。该技术利用信息熵对DBSCAN(Density-Based Spatial Clustering of Applications with Noise)聚类算法生成的聚类簇进行纯度分析,通过实验合理设置熵阈值对数据集中的干扰样本进行过滤,最后利用随机森林算法对过滤后的数据集进行建模,实现了移动应用程序类型的识别。由于仅通过捕捉加密数据流传输模式实现应用识别,对于加密和非加密流量均有效。实验表明所述方法缓解了干扰样本的误判问题,有效地提高了数据集利用率,具有更高的识别准确率和召回率。 With the rapid development of smart mobile terminals and the increasing variety of mobile applications,the identification of mobile application has become an important technology in the fields of network management,marketing and network attack prevention.Actually,almost all mobile applications use the SSL/TLS(Secure Sockets Layer/Transport Layer Security)protocol to encrypt data,making the identification of mobile application more challenging.This paper proposes a novel methodology for the identification of Android Apps from their encrypted network traffic.The method employs the information entropy to analyze the clusters generated by the DBSCAN(Density-Based Spatial Clustering of Applications with Noise)clustering algorithm,and filters the noise samples in the dataset by experimentally setting the entropy threshold.Finally,the classifier is trained by feeding the filtered dataset to the random forest algorithm.Since this method implements application identification only by capturing the transmission pattern of encrypted flow,it is effective for both encrypted and non-encrypted traffic.Experiments show that the methodology alleviates the misjudgment of noise samples,effectively improves the dataset utilization,and has higher recognition accuracy and recall rate.
作者 朱迪 陈丹伟 ZHU Di;CHEN Danwei(School of Computer Science,Nanjing University of Posts and Telecommunications,Nanjing 210023,China)
出处 《计算机工程与应用》 CSCD 北大核心 2020年第4期63-68,共6页 Computer Engineering and Applications
基金 国家自然科学基金(No.61672016)
关键词 加密流量分析 DBSCAN 随机森林 encrypted traffic analysis DBSCAN random forest
  • 相关文献

同被引文献54

引证文献6

二级引证文献9

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部