摘要
针对深度学习方法检测SQL注入时有标签数据不足容易导致模型过拟合的问题,提出了一种基于半监督学习的FlexUDA模型。首先对采集到的数据进行解码、泛化和分词等预处理,然后通过计算TF-IDF值对无标签数据进行增强,并将原始数据和增强后的数据使用TF-IDF和Word2Vec融合算法进行向量化,最后使用FlexUDA模型进行训练,并将训练好的模型与其他模型进行对比分析。实验结果表明,FlexUDA模型仅使用1000条有标签数据和100000条无标签数据进行训练,就获得了99.42%的准确率和99.23%的召回率,相比其他有监督训练模型,表现出了更好的泛化性能,可以很好地解决SQL注入检测中有标签数据不足导致的过拟合问题。
FlexUDA model based on semi-supervised learning is proposed to solve the problem that insufficient labeled data is easy to cause model over fitting when deep learning method detects SQL injection.Firstly,the collected data are preprocessed by decoding,generalization and word segmentation,and then the unlabeled data are augmented by calculating the TF-IDF value.The original data and augmented data are vectorized using TF-IDF and Word2Vec fusion algorithm.Finally,the FlexUDA model is used for training,and the trained model is compared with other models.Experimental results show the FlexUDA model only uses 1000 labeled data and 100000 unlabeled data for training,and achieves 99.42%accuracy and 99.23%recall.Compared with other supervised training models,it shows better generalization performance,and can well solve the over fitting problem caused by insufficient labeled data in SQL injection detection.
作者
王清宇
王海瑞
朱贵富
孟顺建
WANG Qingyu;WANG Hairui;ZHU Guifu;MENG Shunjian(Faculty of Information Engineering and Automation,Kunming University of Science and Technology,Kunming 650500,China)
出处
《计算机科学》
CSCD
北大核心
2023年第S01期787-792,共6页
Computer Science
基金
国家自然科学基金(61863016,61263023)。
关键词
SQL注入检测
半监督学习
无监督数据增强
动态阈值
SQL injection detection
Semi-supervised learning
Unsupervised data augmentations
Dynamic threshold