摘要
传统恶意网页识别缺乏全局性、系统性考量,没有将网页作为有机整体,而是独立针对标签结构、URL地址、文本内容等特定层面特征开展研究,导致准确率较低.虽然已有学者提出融合特征思想,但依旧使用机器学习算法予以实现,特征工程工作量巨大,识别效率低下.针对上述问题,提出一种基于多特征融合的Tri-BERT-SENet模型,用于完成恶意网页的识别任务.利用获取得到的HTML特征、网页URL特征以及网页文本特征,结合BERT模型的上下文感知能力,将特征转化为3个BERT模型输出;之后将模型输出作为特征通道,使用SENet进行加权计算,最终输出识别结果.实验结果表明,与传统机器学习模型以及使用BERT对单一特征的识别方法相比,该检测方法在恶意网页识别的准确率上有较大提升.
Traditional malicious web page recognition lacks global and systematic considerations,and does not take the web page as an organic whole.Instead,it conducts research on specific features such as tag structure,URL address and text content,resulting in low accuracy.Although some scholars have proposed the idea of feature fusion,they still use machine learning algorithm to realize it.The workload of feature engineering is huge and the recognition efficiency is low.In view of the above problems,a Tri-BERT-SENet model based on multi-feature combination is proposed to complete the task of detecting malicious web pages.Using the obtained HTML features,web page URL features and web page text features,combined with the context awareness ability of the BERT model,the features are converted into three BERT models′outputs;After that,the output of the model is taken as the feature channel,and the weighted calculation is carried out using SENet,and the detection result is finally output.The experimental results show that compared with the traditional machine learning model and the single feature detection method using BERT,the detection method has a great improvement in the accuracy of malicious web page detection.
作者
杨立圣
罗文华
YANG Li-sheng;LUO Wen-hua(School of Public Security Information Technology and Intelligence,Criminal Investigation Police University of China,Shenyang 110035,China)
出处
《小型微型计算机系统》
CSCD
北大核心
2023年第4期875-880,共6页
Journal of Chinese Computer Systems
基金
国家重点研发计划项目(2018YFC0830600)资助
辽宁省“百千万人才工程”项目(2020921058)资助
中国刑事警察学院研究生创新能力提升项目(2022YCZD05)资助。