期刊文献+

堆叠去噪自编码器在垃圾邮件过滤中的应用 被引量:13

Application of stacked denoising autoencoder in spamming filtering
下载PDF
导出
摘要 针对垃圾邮件数量日益攀升的问题,提出了将堆叠去噪自编码器应用到垃圾邮件分类中。首先,在无标签数据集上,使用无监督学习方法最小化重构误差,对堆叠去噪自编码器进行贪心逐层预训练,从而获得原始数据更加抽象和健壮的特征表示;然后,在堆叠去噪自编码器的最上层添加一个分类器后,在有标签数据集上,利用有监督学习方法最小化分类误差,对预训练获得的网络参数进行微调,获得最优化的模型;最后,利用训练完成的堆叠去噪编码器在6个不同的公开数据集上进行测试。将准确率、召回率、更具有平衡性的马修斯相关系数作为实验性能评价标准,实验结果表明,相比支持向量机算法、贝叶斯方法和深度置信网络的分类效果,基于堆叠去噪自编码器的垃圾邮件分类器的准确率都高于95%,马修斯相关系数都大于0.88,在应用中具有更高的准确率和更好的健壮性。 Aiming at the continually increasing number of spams, an approach for spare filtering based on the use of Stacked Denoising AUtoencoder (SDA) was proposed. Firstly, to get more abstract and robust feature representation of raw data, greedy layer-wise unsupervised algorithm was used to train the SDA by minimizing the construction error on unlabeled data set. Then a classifier was added on the top :level of SDA. Next, the parameters of SDA were optimized ,with supervised algorithm by minimizing the classification error to Obtain a optimal model on labeled data set. Lastly, experiments were performed on six different public corpora using the trained SDA. The performance of SDA algorithm was compared with Support Vector Machine (SVM), Bayes approach and Deep Belief Network (DBN), by using precision, recall, Matthews Correlation Coefficient (MCC) with more balanced performance measure as the experimental measures. The experimental results indicate that using SDA to.filter spams has higher precision and more robustness. Since it not onty acquires :best average performance with all precision greater than 95%, but also gets close to prefect prediction with all MCC greater than 0.88.
出处 《计算机应用》 CSCD 北大核心 2015年第11期3256-3260,3292,共6页 journal of Computer Applications
关键词 堆叠去噪自编码器 垃圾邮件 分类 支持向量机 贝叶斯方法 Stacked Denoising Autoencoder (SDA) spam classification Support Vector Machine (SVM) Bayesian approach
  • 相关文献

参考文献15

  • 1GARTNER. Gartner survey shows phishing attacks escalated in 2007; more than $3 billion lost to these attacks[EB/OL].[2015-02-20].http://www.gartner.com/it/page.jsp?id=565125.
  • 2CORMACK G V. Email spam filtering: a systematic review[J]. Foundations and Trends in Information Retrieval, 2007, 1(4): 335-455.
  • 3ALMEIDA T A, YAMAKAMI A. Advances in spam filtering techniques[M]. Berlin: Springer, 2012: 199-214.
  • 4SONG Y, KO?CZ A, GILES C L. Better Naive Bayes classification for high-precision spam detection[J]. Software: Practice and Experience, 2009, 39(11): 1003-1024.
  • 5CHOUHAN S. Behavior analysis of SVM based spam filtering using various kernel functions and data representations[C]// Proceedings of the 2013 International Journal of Engineering Research and Technology. Gandhinagar: ESRSA Publications, 2013: 3029-3036.
  • 6HSU W C, YU T Y. Support vector machines parameter selection based on combined Taguchi method and Staelin method for E-mail spam filtering[J]. International Journal of Engineering and Technology Innovation, 2012, 2(2): 113-125.
  • 7CARUANA G, LI M. A survey of emerging approaches to spam filtering[J]. ACM Computing Surveys, 2012, 44(2): Article 9.
  • 8ALMEIDA T A, YAMAKAMI A, ALMEIDA J. Evaluation of approaches for dimensionality reduction applied with naive Bayes anti-spam filters[C]// Proceedings of the 2009 IEEE International Conference on Machine Learning and Applications. Piscataway: IEEE, 2009: 517-522.
  • 9BENGIO Y. Learning deep architectures for AI[J]. Foundations and trends in Machine Learning, 2009, 2(1): 1-127.
  • 10VINCENT P, LAROCHELLE H, BENGIO Y, et al.Extracting and composing robust features with denoising autoencoders[C]// Proceedings of the 25th International Conference on Machine Learning. New York: ACM, 2008: 1096-1103.

同被引文献126

引证文献13

二级引证文献68

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部