基于改进随机森林算法的防火墙日志异常检测并行化方法被引量：1

Parallel implementation method of firewall log anomaly detection based on improved random forest

下载PDF

导出

摘要随机森林分类算法在产生决策树以及投票流程中各个决策树的分类准确度各不相同,由此带来的问题是少部分决策树会影响随机森林算法的整体分类性能。除此以外,数据集中的不平衡数据也能影响到决策树的分类精度。针对以上缺点,对Bootstrap抽样方法添加约束条件,以降低非平衡数据对生成决策树的影响;以及利用袋外数据(Outof-Bagging)和非平衡系数对生成的决策树进行评估加权。试验结果表明,所提算法改善了随机森林对不平衡数据的分类精度。 The classification accuracy of the random forest classification algorithm is different in the decision tree generation and voting process.The problem is that a small number of decision trees will affect the overall classification performance of the random forest algorithm.In addition,the unbalanced data in the dataset can also affect the classification accuracy of the decision tree.In view of the above shortcomings,add constraints to the Bootstrap sampling method to reduce the impact of unbalanced data on the generation of decision trees;And use out of bag data(Out of Bagging)and unbalanced coefficients to evaluate and weight the generated decision tree.The experimental results show that the proposed algorithm improves the classification accuracy of random forests for unbalanced data.

作者刘成王佳斌洪继炜 Liu Cheng;Wang Jiabin;Hong Jiwei(College of Engineering,Huaqiao university,Quanzhou 362021,China)

机构地区华侨大学工学院

出处《现代计算机》 2023年第14期66-69,共4页 Modern Computer

关键词 SPARK 随机森林算法入侵检测日志异常检测 spark random forest intrusion detection log anomaly detection

分类号 TP181 [自动化与计算机技术—控制理论与控制工程] TP393.08 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献1

1徐鹏,林森.基于C4.5决策树的流量分类方法[J].软件学报,2009,20(10):2692-2704. 被引量：171

二级参考文献17

1Moore AW, Zuev D. Internet traffic classification using Bayesian analysis techniques. In: Proc. of the 2005 ACM SIGMETRICS Int'l Conf. on Measurement and Modeling of Computer Systems, Banff, 2005. 50-60. http://www.cl.cam.ac.uk/-awm22 /publications/moore2005internet.pdf.
2Madhukar A, Williamson C. A longitudinal study of P2P traffic classification. In: Proc. of the 14th IEEE Int'l Syrup. on Modeling, Analysis, and Simulation. Monterey, 2006. http://ieeexplore.ieee.org/xpl/ffeeabs_all.jsp?arnumber=1698549.
3Moore AW, Papagiannaki K. Toward the accurate identification of network applications. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005.41-54.
4Karagiannis T, Papagiannaki K, Faloutsos M. BLINC: Multilevel traffic classification in the dark. In: Proc. of the ACM SIGCOMM. Philadelphia, 2005. 229-240. http://conferences.sigcomm.org/sigcomm/2005/paper-KarPap.pdf.
5Roughan M, Sen S, Spatscheck O, Dutfield N. Class-of-Service mapping for QoS: A statistical signature-based approach to IP traffic classification. In: Proc. of the ACM SIGCOMM Internet Measurement Conf. Taormina, 2004. 135-148. http://www.imconf.net/imc-2004/papers/p 135-roughan.pdf.
6Zuev D, Moore AW. Traffic classification using a statistical approach. In: Dovrolis C, ed. Proc. of the PAM 2005. LNCS 3431, Heidelberg: Springer-Verlag, 2005. 321-324.
7Nguyen T, Armitage G. Training on multiple sub-flows to optimise the use of Machine Learning classifiers in real-world IP networks. In: Proc. of the 31 st IEEE LCN 2006. Tampa, 2006. http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=4116573.
8Eerman J, Mahanti A, Arlitt M. Internct traffic identification using machine learning techniques. In: Proc. of the 49th IEEE GLOBECOM. San Francisco, 2006. http://pages.cpsc.ucalgary.ca/-mahanti/papers/globecom06.pdf.
9Erman J, Arlitt M, Mahanti A. Traffic classification using clustering algorithms. In: Proc. of the ACM SIGCOMM Workshop on Mining Network Data (MineNet). Pisa, 2006. http://conferences.sigcomm.org/sigcomm/2006/papers/minenet-01.pdf.
10Bernaille L, Teixeira R, Salamatian K. Early application identification. In: Proc. of the Conf. on Future Networking Technologies 2006 (CoNEXT 2006). Lisboa, 2006. http://portal.acm.org/citation.efm?id=1368445.

共引文献170

1高文才,曹帅.基于MRF-FCM算法的矿井运动目标图像优化[J].工矿自动化,2024,50(S01):69-73.
2邓建国,张素兰,张继福,荀亚玲,刘爱琴.监督学习中的损失函数及应用研究[J].大数据,2020,6(1):60-80. 被引量：41
3代志康,吴秋新,程希明.一种基于ResNet的网络流量识别方法[J].北京信息科技大学学报（自然科学版）,2020,35(1):82-88. 被引量：5
4陈陆颖,丛蓉,杨洁,于华.P2P Streaming Traffic Classification in High-Speed Networks[J].China Communications,2011,8(5):70-78. 被引量：1
5赵树鹏,陈贞翔,彭立志.基于流中前5个包的在线流量分类特征[J].济南大学学报（自然科学版）,2012,26(2):156-160. 被引量：3
6孟姣,王丽宏,熊刚,姚垚.基于机器学习的SSH应用分类研究[J].计算机研究与发展,2012,49(S2):153-159. 被引量：2
7胡婷,王勇,陶晓玲.网络流量分类方法的比较研究[J].桂林电子科技大学学报,2010,30(3):216-219. 被引量：4
8胡婷,王勇,陶晓玲.混合模式的网络流量分类方法[J].计算机应用,2010,30(10):2653-2655. 被引量：8
9易兴辉,王国胤,胡峰.一种新的基于粗糙集的动态样本识别算法[J].南京大学学报（自然科学版）,2010,46(5):501-506. 被引量：8
10刘浩力.多层次压缩决策树在计算机取证中的应用[J].中国信息界,2011(1):60-62.

同被引文献18

1肖勇,郑楷洪,余忠忠,周密,李森,马千里.基于三次指数平滑模型与DBSCAN聚类的电量数据异常检测[J].电网技术,2020,44(3):1099-1104. 被引量：61
2陈仕涛,陈国龙,郭文忠,刘延华.基于粒子群优化和邻域约简的入侵检测日志数据特征选择[J].计算机研究与发展,2010,47(7):1261-1267. 被引量：44
3范春荣,张战勇,董丽娟.基于IIS日志的Web攻击检测系统设计与实现[J].煤炭技术,2013,32(9):202-203. 被引量：2
4毛鹏,张兆宁,林湘宁,孙雅明.基于小波神经网络的电力系统振荡和故障识别[J].电力系统自动化,2002,26(11):9-13. 被引量：19
5蒋宏宇,吴亚东,孙蒙新,王笑,张雨薇.多源网络安全日志数据融合与可视分析方法研究[J].西南科技大学学报,2017,32(1):70-77. 被引量：4
6刘冬兰,马雷,刘新,李冬,常英贤.基于深度学习的电力大数据融合与异常检测方法[J].计算机应用与软件,2018,35(4):61-64. 被引量：38
7梅御东,陈旭,孙毓忠,牛逸翔,肖立,王海荣,冯百明.一种基于日志信息和CNN-text的软件系统异常检测方法[J].计算机学报,2020,43(2):366-380. 被引量：36
8王姣,马静雅,谷丰强,赵瑞,白洁音.基于关联规则的数据挖掘的研究与应用[J].粘接,2020,42(5):95-98. 被引量：8
9潘磊.基于并行Apriori算法的电网日志故障挖掘系统[J].软件导刊,2020,19(9):186-189. 被引量：3
10夏彬,白宇轩,殷俊杰.基于生成对抗网络的系统日志级异常检测算法[J].计算机应用,2020,40(10):2960-2966. 被引量：11

引证文献1

1陈俊廷,王飞庆.基于自适应学习法的行业日志异常数据精准挖掘研究[J].粘接,2024,51(5):169-172.

1孙学礼,刘英杰.电梯空载平衡系数检测装置开发及应用[J].中国科技成果,2023,24(2):2-2.
2李爱华,王迪文,续维佳,李子沫,姚思涵.基于多数据源融合的创业板上市公司财务造假异常检测[J].数据分析与知识发现,2023,7(5):33-47. 被引量：3
3李杰,孙鹤林,雷一鸣,田晓雷,蔡正梓.基于优化决策树算法的变电站故障诊断系统研究[J].自动化技术与应用,2023,42(6):112-115. 被引量：3
4贾孟豪,郭群佐,邓荣鑫.基于Sentinel-2遥感影像的农田防护林自动提取研究[J].农业与技术,2023,43(17):50-53.
5徐红,矫桂娥,张文俊.基于非平衡问题的高斯混合模型卷积神经网络[J].应用科学学报,2023,41(4):657-668.
6Jinfeng Zou,Lu Liu,Mingyao Xia.Simple approach for solution of the quasi-plane-strain problem in a circular tunnel in a strain-softening rock mass considering the out-of-plane stress effect[J].Underground Space,2020,5(4):339-353.
7赵宇,赵淳宇,王梦瑶.基于机器学习的康美药业财务舞弊甄别研究[J].现代管理,2023,13(8):1025-1033.
8刘红,张靖宇,雷梦婷,肖云鹏.基于区块链的公平和可验证电子投票智能合约[J].应用科学学报,2023,41(4):541-562. 被引量：2

现代计算机

2023年第14期

浏览历史

内容加载中请稍等...

基于改进随机森林算法的防火墙日志异常检测并行化方法被引量：1

参考文献1

二级参考文献17

共引文献170

同被引文献18

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于改进随机森林算法的防火墙日志异常检测并行化方法 被引量：1

参考文献1

二级参考文献17

共引文献170

同被引文献18

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于改进随机森林算法的防火墙日志异常检测并行化方法被引量：1