摘要
针对页面特征提取实时性差的问题进行了研究,提出将特征分类,并行提取、检测、再融合结果的方法。首先提取三个类别的主要特征,包括文本、视觉和网络链接;然后分别利用贝叶斯算法、EMD算法以及网络爬虫来进行分类,并且基于后验概率来确定权值的最终选取;最后把这三个分类结果进行融合。通过对贝叶斯、加权和加权贝叶斯的比较,从正确率、漏报率和误报率对算法进行评估。实验表明采用加权贝叶斯的方法来进行融合计算效果最佳,具有较高的准确率和较低的误报率和漏报率,提高了检测的精度和实时性。
This paper studied the view of the problem of poor real-time performance of page feature extraction, and proposed the method of feature classification, parallel extraction, detection and refusion results. First it extracted the main features of the three categories,including text, visual and Internet connection. Then,it used the Bayesian algorithm, EMD algorithm and Web crawler to classify. And determined the weight of the final selection based on the posterior probability. Finally, the fusion of these three classification results. Experiments show that a phishing recognition based on weighted Bayesian algorithm has better performance, through the comparison of Bias ,weighted and weighted Bias, which evaluates the algorithm according to the cor- rect rate, false negative rate and false alarm rate. The accuracy rate can provide higher to improve the accuracy of detection, while ensuring low false positives and false negatives to improve the real-time of detection.
出处
《计算机应用研究》
CSCD
北大核心
2017年第4期1129-1132,共4页
Application Research of Computers
基金
国家自然科学基金资助项目(61202006)
南通市科技计划资助项目(KB2012027)
关键词
网络钓鱼
特征分类
识别
算法融合
加权贝叶斯
phishing
feature classification
recognition
algorithm fusion
weighted Bayesian