期刊文献+

基于特征提取的垃圾邮件检测

SPAM DETECTING BASED ON FEATURE EXTRACTION ALGORITHM
下载PDF
导出
摘要 垃圾邮件处理作为一种典型的文本分类应用问题,受到高维数据的困扰。为提高垃圾邮件检测的效率和准确率,提出一种基于PLS特征提取和SVM的入侵检测算法,首先对原始垃圾邮件数据利用偏最小二乘算法降低维度,再采用遗传算法寻优转换特征子集,并通过支持向量机SVM进行分类。Matlab仿真实验表明,本算法能有效降低数据维数,提高检测的准确率。 As a typical text classification application problem, spam detecting is confused by the high dimensional data problems. In order to improve the efficiency and accuracy of spam detection, this paper proposes an intrusion detection algorithm based on PLS and SVM. The original spam data is projected through using the partial least squares algorithm to reduce the dimension of feature extraction and the genetic algorithm to find the best presented features, and the data is classified by the support vector machine. The Matlab simulation experiments show that our methods can effectively reduce the dimension of data and improve the accuracy of detecting.
出处 《巢湖学院学报》 2014年第3期28-31,共4页 Journal of Chaohu University
基金 安徽省高校自然科学重点项目(项目编号:KJ2012A205 KJ2013A194) 安徽省教育厅项目(项目编号:KJ2010B125 2010SQRL131)
关键词 PLS算法 SVM算法 垃圾邮件检测 特征提取 PLS algorithm SVM algorithm spam detecting feature extraction
  • 相关文献

参考文献9

  • 1苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 2宋枫溪,高秀梅,刘树海,杨静宇.统计模式识别中的维数削减与低损降维[J].计算机学报,2005,28(11):1915-1922. 被引量:44
  • 3Guyon, S. Gunn, Feature Extraction [M].UK, Springer Verlag,2006.
  • 4A.L. Boulesteix, K. Strimmer, Partial Least Squares: A Versatile Tool for the Analysis of High-Dimensional Genomic Data [J]. Briefings in Bioinformatics,2006, (7):32-44.
  • 5I.S. Helland, On the structure of partial least squares regression, Communications in statistics [J].Simulation and computa- tion, 1988, ( 17 ) :581-607.
  • 6周志华,王钰.机器学习及其应用[M].北京:清华大学出版社,2006.
  • 7G.Z. Li, H.L.Bu, M.Q.Yang, etc., Selecting subsets of newly extracted features from PCA and PLS in microarray data analysis [J].BMC Genomics,2008, (9):179-183.
  • 8I.Guyon, A. Elisseef, An introduction to variable and feature selection [J].Journal of Machine Learning Research,2003,3: 1157-1182.
  • 9袁鼎荣,钟宁,张师超.文本信息处理研究述评[J].计算机科学,2011,38(2):9-13. 被引量:11

二级参考文献76

共引文献438

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部