摘要
随着对钓鱼网站分析刻画的不断完善,使得钓鱼网站检测特征呈现高维化特点。属性维度的增加以及数据量的增长,会造成检测计算复杂度呈几何倍数扩大,导致检测时间复杂度高、占用资源大和检测效率低。针对多属性的钓鱼网站检测,设计了一种基于属性降维的钓鱼网站检测方法。该方法使用信息增益方法对原始数据进行特征选择,筛除可能存在的冗余和噪声数据信息;根据不同属性间的互信息计算属性相关性矩阵,利用属性相关性矩阵作为权值参与加权主成分分析;根据得到的降维后新特征数据通过监督学习算法构建钓鱼网站检测模型。实验表明,该方法可以有效降低原始数据中冗余和噪声属性的干扰,能够有效检测出复杂网络环境中的钓鱼网站,同时具有较高的稳定性。
The continuous improvement of the phishing website’s analysis makes the detection characteristics of phishing websites show high dimensional characteristics.With the increase of attribute dimension and the increase of data volume,the complexity of detection and computation will increase exponentially,resulting in high detection time complexity,large resource occupation and low detection efficiency.This thesis proposes a phishing detection method based on attribute reduction for multi-attribute phishing detection.Firstly,the information gain method is used to select the features of the original data,so the possible redundancy and noise data information are screened.Secondly,this method calculates the attribute correlation matrix according to the mutual information between different attributes,the attribute correlation matrix is used as the weighting to participate in the weighted principal component analysis.Finally,according to the new feature data obtained by dimensionality reduction,a phishing detection model is constructed by supervised learning algorithm.The experimental results show that the method can effectively reduce the interference of redundant data and noise attributes in the original data,and can effectively detect the phishing sites in the complex network environment,and has high stability at the same time.
作者
杨云
徐光侠
雷娟
YANG Yun;XU Guangxia;LEI Juan(State Grid Chongqing Electric Power Company,Chongqing 400014,P.R.China;Information and Communication Engineering Postdoctoral Research Station Chongqing University,Chongqing 400044;School of Software Engineering,Chongqing University of Posts and Telecommunications,Chongqing 400065,P.R.China;State Grid Chongqing Electric Power Co.Electric PowerResearch Institute,Chongqing 401123,P.R.China)
出处
《重庆邮电大学学报(自然科学版)》
CSCD
北大核心
2018年第4期564-571,共8页
Journal of Chongqing University of Posts and Telecommunications(Natural Science Edition)
基金
国家自然科学基金(61772099)
中国博士后基金(2014M562282)
重庆市博士后项目(XM2014039)
重庆市人工智能技术创新重大主题专项(cstc2017rgzn-zdyf0140)
重庆市高校优秀成果转化资助(KJZH17116)~~
关键词
多属性
降维
钓鱼网站
检测
multi attribute
dimensionality reduction
fishing site
detection