摘要
如何快速有效地计算网页的相似性是发现钓鱼网页的关键.现有的钓鱼网页检测方法在检测效果上依然存在较大的提升空间.文中提出基于匈牙利匹配的钓鱼网页检测模型,该模型首先提取渲染后网页的文本特征签名、图像特征签名以及网页整体特征签名,比较全面地刻画了网页访问后的特征;然后通过匈牙利算法计算二分图的最佳匹配来寻找不同网页签名之间匹配的特征对,在此基础上能够更加客观地度量网页之间的相似性,从而提高钓鱼网页的检测效果.一系列的仿真实验表明文中方法可行,并具有较高的准确率和召回率.
It is the key problem for detecting the phishing pages how to quickly and efficiently to calculate the similarity of web pages. There is still a large space to improve the detecting efficien- cy in current anti phishing method. A method of detecting phishing web pages based on bipartite graph matching is brought forward. In this model, the signature of text, the signature of images, and the signature of the overall web page are extracted. Then, by the Hungarian algorithm, the best match in the bipartite graph(signatures in different pages) is found. The pairs of features are then used to measure the similarity between pages in an more objective way, thereby the effec- tiveness of phishing page detection is improved. A series of simulation experiments show that this method is feasible with high precision and recall rate.
出处
《计算机学报》
EI
CSCD
北大核心
2010年第10期1963-1975,共13页
Chinese Journal of Computers
基金
国家自然科学基金(60703086
60873050
60803008
60973046
苏州大学江苏省计算机信息处理技术重点实验室基金(KJS0714)
江苏省高校自然科学研究计划(09KJB520012)资助