摘要
为了有效检测恶意网络钓鱼(phishing)行为,提出一种基于URL特征的phishing检测方法.该方法首先对现有钓鱼URL与合法URL进行分析对比,提取钓鱼URL的显著特征,然后采用机器学习算法对样本数据集训练从而获得分类检测模型,用来检测待检测的URL.为适应钓鱼URL的变化,分类模型需要根据新增样本不断更新,因此,设计了一种基于原始样本数据反馈的增量学习算法.实验表明:提取的URL特征与支持向量机(SVM)分类算法的结合能够使phishing检测达到较高的检测精度,且该增量学习算法是有效的.
In order to effectively detect malicious phishing behaviors, a phishing detection method based on the uniform resource locator (URL) features is proposed. First, the method compares the phishing URLs with legal ones to extract the features of phishing URLs. Then a machine learning algorithm is applied to obtain the URL classification model from the sample data set training. In order to adapt to the change of a phishing URL, the classification model should be constantly updated according to the new samples. So, an incremental learning algorithm based on the feedback of the original sample data set is designed. The experiments verify that the combination of the URL features extracted in this paper and the support vector machine (SVM) classification algorithm can achieve a high phishing detection accuracy, and the incremental learning algorithm is also effective.
基金
The National Basic Research Program of China(973 Program)(No.2010CB328104,2009CB320501)
the National Natural Science Foundation of China(No.61272531,61070158,61003257,61060161,61003311,41201486)
the National Key Technology R&D Program during the11th Five-Year Plan Period(No.2010BAI88B03)
Specialized Research Fund for the Doctoral Program of Higher Education(No.20110092130002)
the National Science and Technology Major Project(No.2009ZX03004-004-04)
the Foundation of the Key Laboratory of Netw ork and Information Security of Jiangsu Province(No.BM2003201)
the Key Laboratory of Computer Netw ork and Information Integration of the Ministry of Education of China(No.93K-9)
关键词
URL特征
phishing检测
支持向量机
增量学习
uniform resource locator (URL) features
phishingdetection
support vector machine
incremental learning