摘要
针对钓鱼邮件检测过程中提取特征数量愈加庞大,检测效果没有明显提升且时间成本增加这一问题,提出了一种钓鱼邮件检测方法.该方法提出将原始的42维邮件特征转换为2个新特征,即基于密度的特征和基于距离的特征,检测准确率最高可达99. 74%,分类时间仅需3. 39 s,是传统算法的1/20.实验结果表明,该方法具有较好的检测效果,并且降低了时间成本.
Phishing E-mail detection methods are mostly focused on the extraction of different E-mail features, which lead the time increasing. To solve this problem, a method based on density and distance was proposed. The method replaces the 42 original mail features with 2 new ones, i. e., features based on density and distance. Then the machine learning classification algorithm was used to detect phishing E-mail. The detection accuracy of the proposed method reaches 99.74%, and time is only 3.39 s, which is 1/20 of the traditional algorithm. Results show that the algorithm has a better detection performance and saves much time.
作者
王秀娟
张晨曦
唐昊阳
陶元睿
WANG Xiujuan;ZHANG Chenxi;TANG Haoyang;TAO Yuanrui(Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China)
出处
《北京工业大学学报》
CAS
CSCD
北大核心
2019年第6期546-553,共8页
Journal of Beijing University of Technology
基金
国家重点研发计划资助项目(2017YFB0802703)
国家自然科学基金资助项目(61602052)
关键词
机器学习
钓鱼邮件
特征提取
维度缩减
支持向量机
machine learning
phishing E-mail
feature extraction
dimensionality reduction
supportvector machine (SVM)