摘要
由于传统的TF-IDF算法没有很好地分配特征词的权重,从而会出现特征提取不充分并且效率低等问题,导致结果不符合实际情况。为了解决该方法在SQL注入攻击检测时所产生的局限性,本文通过在传统的TF-IDF算法里面加入文本数量比因子和卡方统计量CHI来改进TF-IDF,能够很好地改善一些重要词汇的权重问题。通过选择不同的分类器实现SQL注入攻击的检测,从而获得不同的分类结果。实验结果表明,Boosted Decision Tree和改进的TF-IDF相结合的方法与其它同类方法相比,具有更高的准确率、召回率和F1值。此外,本文算法相较于传统的TF-IDF算法对SQL注入攻击检测的正确率、准确率、召回率、F1值均提高5%左右,具有一定的实际应用前景。
Because the traditional TF-IDF algorithm does not allocate the weight of feature words well,there will be problems of insufficient feature extraction and low efficiency,resulting in the results not in line with the actual situation.In order to solve the limitations of this method in SQL injection attack detection,this paper improves TF-IDF by adding text quantity ratio factor and Chi statistics to the traditional TF-IDF algorithm,which can well improve the weight of some important words.The detection of SQL injection attacks is realized by selecting different classifiers,so as to obtain different classification results.The experimental results show that the combination of boosted decision tree and improved TF-IDF has higher accuracy,recall and F1 value than other similar methods.In addition,compared with the traditional TF-IDF algorithm,the correctness,accuracy,recall and F1 value of the proposed algorithm are improved by about 5%,which has a certain practical application value.
作者
关慧
盛靖媛
曹同洲
GUAN Hui;SHENG Jing-yuan;CAO Tong-zhou(College of Computer Science and Technology,Shenyang University of Chemical Technology,Shenyang 110142,China;Key Laborotary of Industrial Intelligence Technology on Chemical Process,Liaoning Province,Shenyang 110142,China)
出处
《计算机与现代化》
2022年第6期122-126,共5页
Computer and Modernization
基金
辽宁省教育厅高等学校基本科研项目(LJKZ0434)。