期刊文献+

基于CNN和BiLSTM的钓鱼URL检测技术研究 被引量:5

Research on Phishing URL Detection Technology Based on CNN-BiLSTM
下载PDF
导出
摘要 为了解决日益严峻的网络钓鱼问题,提出一种基于卷积神经网络(CNN)和双向长短记忆网络(BiLSTM)的钓鱼URL检测方法CNN-BiLSTM。该方法首先基于敏感词分词的方法对URL分词,根据特殊字符和敏感词对URL进行单词级别划分,对其中的非敏感词进行字符级别划分,以获取特殊字符和敏感词的有效信息,提升利用URL数据信息的程度;然后将分词后的URL输入到CNN和BiLSTM中,通过CNN获取URL的空间局部特征,通过BiLSTM获取URL的双向长距离依赖特征,基于自动提取的特征检测钓鱼网页。实验结果表明:基于CNN和BiLSTM的钓鱼URL检测方法能够达到较好的检测效果,其准确率达到了98.84%,精确率达到了99.71%,召回率达到了98.04%,F1值达到了98.86%。此方法相对于传统的机器学习和黑名单检测方法,无须人工提取特征且能识别新出现的钓鱼网页。 In order to solve the increasingly serious problem of phishing,a phishing URL detection method based on convolution neural network(CNN)and bi-directional long short termmemory(BiLSTM)was proposed.This method first classified the URL based on the sensitive word segmentation method;classified the URL according to the special characters and sensitive words;and classified the non-sensitive words in the character level,so as to obtain the effective information of the special characters and sensitive words,and improve the use of URL data information.Then the segmented URL was input into CNN and BiLSTM,to obtain the spatial local features of the URL through CNN,to obtain the bidirectional long-distance dependent features of the URL through BiLSTM,and to detect phishing webpages based on the automatically extracted features.Compared with traditional machine learning and blacklist detection methods.Experimental results showed that the phishing URL detection method based on CNN and BiLSTM could achieve better detection results,the accuracy rate was 98.84%,the precision rate was 99.71%,the recall rate was 98.04%,and the F1 value was 98.86%.This method did not require manual feature extraction and could identify newly emerging phishing webpages.
作者 卜佑军 张桥 陈博 张稣荣 王方玉 BU Youjun;ZHANG Qiao;CHEN Bo;ZHANG Surong;WANG Fangyu(PLA Strategic Support Force Information Engineering University, Zhengzhou 450001,China;Zhongyuan Network Security Research Institute, Zhengzhou University, Zhengzhou 450001, China)
出处 《郑州大学学报(工学版)》 CAS 北大核心 2021年第6期14-20,共7页 Journal of Zhengzhou University(Engineering Science)
基金 国家重点研发计划项目(2017YFB0803201) 国家自然科学基金资助项目(61572519)。
关键词 钓鱼URL URL分词 卷积神经网络 双向长短记忆网络 phishing URL URL segmentation CNN BiLSTM
  • 相关文献

参考文献2

二级参考文献61

  • 1Mahmoud K, Youssef I, Andrew J. Phishing detection: A literature survey. IEEE Communications Surveys & Tutorials, 2013, 15(4): 2091-2121.
  • 2Paul K, Georgia K, Hector G M. Fighting spam on social Web sites a survey of approaches and future challenges. IEEE Internet Computing, 2007, 11(6): 36-45.
  • 3Priya M, Sandhya L, Ciza T. A static approach to detect drive-by-download attacks on Webpages//Proceedings of the International Conference on Control Communication and Computing. Xi'an, China, 2013:298-303.
  • 4Mavrommatis N P P, Monrose M A R F. All your iframes point to us//Proceedings of the 17th USENIX Security Symposium. San Jose, USA, 2008:1-22.
  • 5Ma J, Saul L K, Savage S, Voetker G M. Beyond blacklists: Learning to detect malicious Web sites from suspicious URLs//Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, USA, 2009: 1245-1253.
  • 6Ma J, Saul L K, Savage S, Voelker G M. Identifying suspi- cious URLs: An application of large-scale online learning// Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Canada, 2009:681-688.
  • 7Ma J, Saul L K, Savage S, Voelker G M. Learning to detect malicious URLs. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 1-24.
  • 8Canali D, et al. Prophiler: A fast filter for the large-scale detection of malicious Web pages//Proceedings of the 20th International Conference on World Wide Web. Hyderabad, India, 2011:197-206.
  • 9Thomas K, et al. Design and evaluation of a real-time URL spam filtering service//Proceedings of the IEEE Symposium on Security and Privacy. Oakland, USA, 2011:447-462.
  • 10Yadav S, Reddy A K K, Reddy A L, et al. Detecting algorithmic.ally generated malicious domain names//Proeeedings of the 10th ACM SIGCOMM Conference on Internet Measurement. New York, USA, 2010:48-61.

共引文献43

同被引文献30

引证文献5

二级引证文献10

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部