摘要
针对大量冗余数据带来的钓鱼网站检测准确率不够、误判率较高等问题,提出一种基于最大相关最小冗余(mRMR)和随机森林(RF)相结合的特征选择方法(mRMR-RF),并利用极端梯度提升(XGBoost)算法构建钓鱼网站检测模型。利用mRMR和RF算法分别对特征进行排序;综合两种特征排序得出最终的排序结果,并根据实验得出的最佳特征数选出XGBoost模型所需的最优特征子集;使用最优特征子集对XGBoost分类模型进行训练。实验结果表明,该方法相比其他分类方法可以提高钓鱼网站检测的准确率,具有实际意义。
Aiming at the problem of inadequate detection accuracy and high misjudgment rate of phishing websites caused by a large amount of redundant data,we propose a feature selection method(mRMR-RF)based on the combination of maximum correlation minimum redundancy(mRMR)and random forest(RF).And an extreme gradient lifting(XGBoost)algorithm is used to construct the detection model of phishing websites.It used the mRMR and RF algorithms to sort the features separately.The final sorting result was obtained by synthesizing two kinds of feature sorting,and the optimal feature subset required by XGboost model was selected according to the best feature number obtained by the experiment.Then,the XGBoost classification model was trained by using the optimal feature subset.The experimental results show that this method can improve the accuracy of phishing website detection compared with other classification methods,and it has practical significance.
作者
毕青松
梁雪春
陈舒期
Bi Qingsong;Liang Xuechun;Chen Shuqi(College of Electrical Engineering and Control Science,Nanjing Tech University,Nanjing 211816,Jiangsu,China)
出处
《计算机应用与软件》
北大核心
2020年第9期296-301,共6页
Computer Applications and Software
基金
江苏省研究生科研与实践创新计划项目(KYCX19-0874)。
关键词
特征选择
最大相关最小冗余
随机森林
XGBoost
钓鱼网站
Feature selection
Maximum correlation and minimum redundancy
Random forest
XGBoost
Phishing website