摘要
HTTPS协议用以网站服务器的身份认证,提供交换数据的保密性和完整性。然而一些不法分子利用HTTPS页面散布不良信息,这给通信流量的管理和安全分析带来了新的挑战。因此,准确识别基于SSL/TLS的HTTPS加密应用,对于提高网络服务质量、优化网络带宽分配、加强安全管控有着重要意义。现有的方法大多侧重于直接识别网站和应用程序,而很少关注类别的层次性结构。本文提出一种根据HTTPS应用类别的树状层次结构,自顶向下,逐层分类识别的方法,在顶层根据签名和样本流的关联关系将业务流识别为对应的大类,在次顶层提取检测流的特征值,使用随机森林模型分类为对应的最底层子类。实验结果表明,该方法能克服直接识别方法分类误差高的缺点,提高业务识别的精确率。
The HTTPS protocol uses the identity authentication of the web server to provide confidentiality and integrity of the exchanged data. However, some criminals use HTTPS pages to spread bad information, which brings new challenges to the management and security analysis of communication traffic. Therefore, accurately identifying SSL/TLS-based HTTPS encryption applications is of great significance for improving network service quality, optimizing network bandwidth allocation, and strengthening security management. Most of the existing methods focus on directly identifying websites and applications, and rarely pay attention to the hierarchical structure of categories. This paper proposes a tree-level hierarchical structure based on the HTTPS application category, which is a top-down, layer-by-layer classification and recognition method. At the top level, the business flow is identified as the corresponding large class according to the association relationship between the signature and the sample stream, and is extracted at the top level. The feature values of the stream are detected and classified into the corresponding lowest level subclasses using a random forest model. The experimental results show that the proposed method can overcome the shortcomings of the direct recognition method and improve the accuracy of business identification.
作者
张磊
赵辉
ZHANG Lei;ZHAO Hui(College of Cybersecurity,Sichuan University,Chengdu,610065,China)
出处
《网络新媒体技术》
2020年第3期14-20,共7页
Network New Media Technology
基金
国家重点研发计划(2016YFB0800604,2016YFB0800605)
国家自然科学基金项目(61572334,U1736212)
四川省重点研发项目(2018GZ0183)