期刊文献+

基于特征选择模型的钓鱼网站快速识别方法 被引量:4

Method for Quickly Identifying Phishing Websites Based on Feature Selection Model
下载PDF
导出
摘要 目前在识别钓鱼网站的研究中,对识别速度有着越来越高的需求,因此提出了一种基于混合特征选择模型的钓鱼网站快速识别方法。混合特征选择模型包含初次特征选择、二次特征选择和分类三个主要部分,使用信息增益、卡方检验相结合以及基于随机森林的递归特征消除算法建立了混合特征选择模型,并在模型中使用分布函数与梯度,获取最佳截断阈值,得到最优数据集,从而提高钓鱼网站识别的效率。实验数据表明,使用该混合特征选择模型进行特征筛选后的数据集,维度降低了79.2%,在分类精确度几乎不损失的情况下,降低了32%的分类时间复杂度,有效地提高了分类效率。另外,使用UCI机器学习库中的大型钓鱼数据集对该模型进行评价,分类精确率虽然损失1.7%,但数据集维度降低了70%,分类时间复杂度降低了41.1%。 At present,in the research of identifying phishing websites,there is an increasing demand for recognition speed.Therefore,we propose a fast recognition method for phishing websites based on a mixed feature selection model.The mixed feature selection model consists of three main parts:primary feature selection,secondary feature selection and classification.A hybrid feature selection model is established by combining information gain,Chi-square test and recursive feature elimination algorithm based on random forest.The distribution function and gradient are used in the model to obtain the optimal cutoff threshold and the optimal data set,so as to improve the efficiency of phishing website recognition.Experimental data shows that the data set after feature selection using this mixed feature selection model has a 79.2%reduction in dimension,and reduces the classification time complexity by 32%with almost no loss of classification accuracy,effectively improving classification effectiveness.In addition,using the large-scale fishing data set in the UCI machine learning library to evaluate the model,although the classification accuracy rate is lost by 1.7%,the data set dimension is reduced by 70%,and the classification time complexity is reduced by 41.1%.
作者 陈鹏 李勇志 余肖生 CHEN Peng;LI Yong-zhi;YU Xiao-sheng(School of Computer and Information,Three Gorges University,Yichang 443002,China)
出处 《计算机技术与发展》 2021年第4期40-45,共6页 Computer Technology and Development
基金 国家重点研究发展计划资助项目(2016YFC0802500)。
关键词 特征选择 信息增益 卡方检验 随机森林 递归特征消除 feature selection information gain Chi-square test random forest recursive feature elimination
  • 相关文献

参考文献5

二级参考文献40

  • 1徐凤亚,罗振声.文本自动分类中特征权重算法的改进研究[J].计算机工程与应用,2005,41(1):181-184. 被引量:56
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:387
  • 3朱雪梅.基于动态方法的嵌入式软件缺陷检测技术研究与实现[D].杭州:杭州电子科技大学,2014.
  • 4Zeng G P, Fan H L. Two-subpopulation particle swarm optimi- zation based on pheromone diffusion [ J ]. Applied Mechanics and Materials ,2014,667:300-308.
  • 5Vapnik V N. The nature of statistical learning theory [ M ]. New York : Springer-Verlag, 1995.
  • 6Basili V, Green S, Laitenberger O, et al. The empirical investi- gation of perspective - based reading [ J ]. Empirical Software Engineering, 1996,1 : 133-164.
  • 7Mozina M, Zabkar J, Bratko I. Argument based machine learn- ing [ J ]. Artificial Intelligence,2007,171:922-937.
  • 8Mundra P A,Rajapakse J C. SVM-RFE with MRMR filter for gene selection [ J ]. IEEE Trans on NanoBioscience, 2010,9 (1) :31-37.
  • 9Kazman R, Bass L, Abowd G, et al. SAAM : a method for analy- zing the properties of software architectures[ C ]//Proceedings of the 16th international conference on software engineering. Sorrento, Italy : IEEE, 1994 : 81-90.
  • 10Whatling C, McPheat W, Hersloef M. The potential link be- tween atherosclerosis and the 5-1ipoxygenase pathway:investi- gational agents with new implications for the cardiovascular field[ J ]. Expert Opinion on Investigational Drugs, 2007,16 (12) :1879-1893.

共引文献30

同被引文献34

引证文献4

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部