摘要
近来,通过仿冒真实网站的URL地址及其页面内容的"钓鱼网站"已严重威胁到互联网用户的隐私和财产安全.为了应对这种威胁,该文通过对大量已知正常网站和钓鱼网站的学习,解析其对应的网页内容,提取相应的网页标题、网页关键字、网页描述信息等8种特征来描述这些网站,然后基于不同的特征表达方法构建了相应的分类器;对于待检测的网站,采用分类集成的方法综合各个分类模型的预测结果,达到对钓鱼网站智能检测的目标.基于上述方法,构建了钓鱼网站智能检测系统IPWDS,并将其集成于金山安全产品中.在大量、真实数据集的基础上,实验结果表明IPWDS系统对钓鱼网站的检测效果优于现有常见的钓鱼网站检测方法和常用的反钓鱼软件.
By counterfeiting the real URL address and the actual page content,phishing websites have been a serious threat to the Internet user's privacy and property.In this paper,the authors propose an automatic method for intelligent phishing website detection through learning from a large number of normal and phishing websites.In particular,given a website,the authors first parse and analyze its webpage content and extract 8 different types of features such as title,keywords and description information to represent the website.Classifiers are then built based on these different feature representations.Finally classification ensemble methods are used to combine the prediction results of individual classifiers together for phishing website detection.Using the proposed method,the authors developed an intelligent phishing website detection system IPWDS,which has already been integrated into the Kingsoft's security products.Experiments on real-world datasets demonstrate that IPWDS outperforms existing popular detection methods and commonly used anti-phishing software tools in phishing website detection.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2011年第10期2008-2020,共13页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(10771176)
广东省产学研重大科技专项(2008A09030001)
关键词
钓鱼网站
分类器
分类集成
phishing website
classifier
classification ensemble