期刊文献+

基于多维特征的涉诈网站检测与分类技术研究

Research on detection and classification of fraudulent websites based on multi-dimensional features
下载PDF
导出
摘要 随着互联网的发展与普及,涉诈团伙诈骗手法与反检测技术愈发先进,涉诈网站的检测与分类对于网络空间安全重要性更加显著,而传统的检测技术已无法应对现在的新型诈骗网站,并且针对涉诈网站分类的研究很少.针对此热点难题,本文分析了当今新型涉诈网站的多个典型特征并提出了一种基于多维特征的涉诈网站检测与分类系统.该系统共构建11种涉诈网站特征与3600个网页关键词来表示一个涉诈网站.系统首先利用爬虫获取待检测域名的网页截图、WHOIS信息与源码并交给特征抽取模块构建多维特征集.检测模块提取网站域名、代码结构以及网站WHOIS信息作为特征,构建随机森林模型实现检测任务.然后基于检测结果,网页分类模块利用双向GRU提取网页的文本特征,在置信度小于0.7的情况下使用BERT模型从而保证系统准确度与效率,并使用残差神经网络提取网页截图特征,同时计算网页内部图片与网站Logo相似度,创建随机森林模型进行分类,并设计了对比实验进一步分析模型的准确性.实验证明,本文提出的模型拥有很高的准确性,模型平均F1-score达到97.28%.实验结果表明,本文提出的多维特征模型能很好地区分涉诈网站与正常网站,克服了传统方法应对新型涉诈网站的识别问题,并适用于全球新增域名的涉诈网站快速检测与分类. With the development and widespread use of the Internet,the tactics of fraudulent groups and their anti-detection technologies have been significantly advanced.Consequently,the detection and classification of fraudulent websites have become increasingly significant for maintaining cybersecurity in cyberspace.Traditional detection methods,however,are proving insufficient in dealing with the emerging forms of deceptive websites and there is a notable dearth of research focused on the classification of these deceptive sites.To address this issue,this paper analyzes the typical features of current new fraudulent websites and proposes a multi-dimensional feature-based system for detecting and classifying fraudulent websites,which incorporates a total of 11 types of fraudulent website features and 3600 web keywords to represent fraudulent websites.The system initially uses a crawler to obtain the screenshot of a web page,WHOIS information and source code of a domain to be detected and then delivers them to the feature extraction module to construct a multidimensional feature set.The detection module extracts website domain names,code structure and WHOIS information as features and constructs a random forest model to perform the detection task.Subsequently,based on the detection results,the webpage classification module utilizes bi-directional GRU to obtain the textual features of the webpage.In cases where the confidence level is below 0.7,the module employs a BERT model to ensure accuracy and efficiency.Additionally,a residual neural network is used to extract the webpage screenshot features while simultaneously calculating the similarity between the internal pictures of the webpage and the website Logo,and a Random Forest model is used for classification.Comparison experiments were conducted to evaluate the accuracy of the method.The experimental results demonstrate that our method achieves the highest accuracy with an average F1-score of 97.28%.Moreover,the results show that the multidimensional feature model effectively distinguishes between fraudulent and legitimate websites,overcomes limitations of traditional methods in detecting new fraudulent websites,and is suitable for the rapid detection and classification of fraudulent websites with new domain names on a global scale.
作者 游畅 黄诚 田璇 燕玮 冷涛 YOU Chang;HUANG Cheng;TIAN Xuan;YAN Wei;LENG Tao(School of Cyber Science and Engineering,Sichuan University,Chengdu 610065,China;The 6th Research Institute of China Electronics Corporation,Beijing 102209,China;Intelligent Policing Key Laboratory of Sichuan Province,Sichuan Police College,Luzhou 646099,China;Institute of Information Engineering,Chinese Academy of Sciences,Beijing 100864,China)
出处 《四川大学学报(自然科学版)》 CAS CSCD 北大核心 2024年第4期27-36,共10页 Journal of Sichuan University(Natural Science Edition)
基金 智能警务四川省重点实验室开放课题(ZNJW2024KFZD003) 四川省科技厅应用基础项目(2022NSFSC0752)。
关键词 涉诈网站检测 网站分类 随机森林 深度学习 Fraudulent website detection Website classification Random forest Deep learning
  • 相关文献

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部