摘要
随着网络技术与应用的发展,Web服务器不可避免地成为了黑客的主要攻击目标.而传统基于正则匹配的Web入侵检测系统存在规则库维护困难、特征库臃肿的问题;基于机器学习的常规检测模型也存在特征提取复杂、识别率较低的问题.针对这些问题,提出一种基于TF-IDF和随机森林构架的Web攻击流量检测模型,该模型使用TF-IDF算法构建词频矩阵,自动提取有效载荷的特征,使用随机森林算法进行分类建模,识别出正常流量与攻击流量.实验结果表明:该方法对攻击流量的检测率达到98.7%,实现了特征自动提取,简化了检测方法,适合于进行Web攻击流量的检测.
With the rapid development of network and application technology, Web server became the main attack target of hackers. However, the traditional Web intrusion detection system based on regular feature matching has some problems, such as difficult maintenance of rule base and bloated feature base. Some detection models based on machine learning algorithm must also be extracted by human hands, and still the recognition rate is not high. Aiming at these problems, this paper proposed a new model to train words and characters based on TF-IDF algorithm, which combines the word frequency matrices obtained by the two training methods as feature vectors, and classifies the vector sets by using random forest algorithm to identify malicious traffic and normal traffic. From the experiments we can found that our model s detection rate reached 98.7%. And the experimental results also showed that our model can realize automatic feature extraction and simplifies the detection method. It is very suitable for detecting malicious Web traffic.
作者
祝鹏程
方勇
黄诚
刘强
Zhu Pengcheng;Fang Yong;Huang Cheng;Liu Qiang(College of Electronics and Information,Sichuan University,Chengdu 610065;College of Cybersecurity,Sichuan University,Chengdu 610207)
出处
《信息安全研究》
2018年第11期1040-1045,共6页
Journal of Information Security Research