期刊文献+

基于mRMR-RF特征选择和XGBoost模型的钓鱼网站检测 被引量:12

PHISHING WEBSITE DETECTION BASED ON MRMR-RF FEATURE SELECTION AND XGBOOST MODEL
下载PDF
导出
摘要 针对大量冗余数据带来的钓鱼网站检测准确率不够、误判率较高等问题,提出一种基于最大相关最小冗余(mRMR)和随机森林(RF)相结合的特征选择方法(mRMR-RF),并利用极端梯度提升(XGBoost)算法构建钓鱼网站检测模型。利用mRMR和RF算法分别对特征进行排序;综合两种特征排序得出最终的排序结果,并根据实验得出的最佳特征数选出XGBoost模型所需的最优特征子集;使用最优特征子集对XGBoost分类模型进行训练。实验结果表明,该方法相比其他分类方法可以提高钓鱼网站检测的准确率,具有实际意义。 Aiming at the problem of inadequate detection accuracy and high misjudgment rate of phishing websites caused by a large amount of redundant data,we propose a feature selection method(mRMR-RF)based on the combination of maximum correlation minimum redundancy(mRMR)and random forest(RF).And an extreme gradient lifting(XGBoost)algorithm is used to construct the detection model of phishing websites.It used the mRMR and RF algorithms to sort the features separately.The final sorting result was obtained by synthesizing two kinds of feature sorting,and the optimal feature subset required by XGboost model was selected according to the best feature number obtained by the experiment.Then,the XGBoost classification model was trained by using the optimal feature subset.The experimental results show that this method can improve the accuracy of phishing website detection compared with other classification methods,and it has practical significance.
作者 毕青松 梁雪春 陈舒期 Bi Qingsong;Liang Xuechun;Chen Shuqi(College of Electrical Engineering and Control Science,Nanjing Tech University,Nanjing 211816,Jiangsu,China)
出处 《计算机应用与软件》 北大核心 2020年第9期296-301,共6页 Computer Applications and Software
基金 江苏省研究生科研与实践创新计划项目(KYCX19-0874)。
关键词 特征选择 最大相关最小冗余 随机森林 XGBoost 钓鱼网站 Feature selection Maximum correlation and minimum redundancy Random forest XGBoost Phishing website
  • 相关文献

参考文献5

二级参考文献36

  • 1Justin M K. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs[C]//Proc. of ACM SIGKDD’09. Paris, France: ACM Press, 2009: 1245-1253.
  • 2Basnet S M. Detection of Phishing Attacks: A Machine Learning Approach[M]//Prasad B. Soft Computing Applications in Industry. Berlin, Germany: Springer, 2008.
  • 3Pan Ying. Anomaly Based Web Phishing Page Detection[C]//Proc. of Computer Security Applications Conference. Miami Beach, Florida, USA: [s. n.], 2006: 381-392.
  • 4Wilson T M. Improved Heterogeneous Distance Functions[J]. Journal of Artificial Intelligence Research, 1997, 6(1): 1-34.
  • 5Anti-Phishing Working Group [EB/OL]. http://www.antiphishing. org, 2008-01/2011-12-15.
  • 6PhishTank [EB/OL]. http://www.phishtank.com, 2011-04/2011-12-15.
  • 7Engin Kirda, Christopher Kruegel. Protecting Users against Phishing Attacks[J]. The Computer Journal, 2006, 49(05):554-561.
  • 8Ian Fette, Norman Sadeh, Anthony Tomasic. Learning to Detect Phishing Emails[C]. In Proc. of the WWW 2007, Alberta, Canada, May 8-12, 2007: 649-656.
  • 9Chenfeng Vincent Zhou, Christopher Leckie, Shanika Karunasekera. Collaborative Detection of Fast Flux Phishing Domains[J]. Journal of Networks, 2009, 4(01):75-84.
  • 10D. Kevin McGrath, Minaxi Gupta. Behind Phishing: An Examination of Phisher Modi Operandi[C]. In Proc. of the 1st Usenix Workshop on Large- Scale Exploits and Emergent Threats, California USA, April 15 2008:1-8.

共引文献61

同被引文献92

引证文献12

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部