期刊文献+

基于特征选择融合的垃圾邮件过滤方法 被引量:2

A SPAM FILTERING METHOD BASED ON FEATURE SELECTION FUSION
下载PDF
导出
摘要 针对传统垃圾邮件过滤问题中采用单一特征选择方法不能够有效提取训练集中全部重要特征或提取结果存在特征冗余的问题,提出一种基于多种特征选择方法融合的垃圾邮件过滤模型SF_FSF(Spam filtering based on feature selection fusion)。SF_FSF方法通过引入信息融合的概念,将特征选择看成一个决策问题,采用基于平均投票法的信息融合模型进行特征选择结果的融合,以提取垃圾邮件数据集中的重要特征,获得优秀的过滤能力。实验结果表明,SF_FSF方法比基于单一特征选择的垃圾邮件过滤方法得到了更好的过滤结果。 In this paper we present a spam filtering method SF_FSF,it is based on multi-feature selection methods fusion and in order to solve the problem of traditional spam filtering methods that they use single feature selection method so can not select all the important features in training set or the extracted results have feature redundancy. Based on introducing the concept of information fusion,SF_SFS deems feature selection as the decision making problem,and uses average voting method-based information fusion model to fuse feature selection results in order to extract important features of spam dataset and to obtain excellent filtering capability. Experimental results demonstrate that the KS_ FSF method can achieve better filtering results than the traditional single feature selection-based spam filtering methods.
作者 白宁
出处 《计算机应用与软件》 CSCD 北大核心 2014年第4期31-34,共4页 Computer Applications and Software
关键词 垃圾邮件过滤 特征选择 信息融合 平均投票法 Spam filtering Feature selection Information fusion Average voting method
  • 相关文献

参考文献12

  • 1Ducheneaut N, Watts L. In search of coherence : a review of e-mail re- search [ J ]. Human-Computer Interaction,2004 : 11 - 48.
  • 2中国反垃圾邮件状况调查报告[DB/OL],2010-07-15.http://ww.isc.org.cn/zxzx/xhdt/listinfo一1775.html.
  • 3Androutsopoulos I, Koutsias J, Chandrinos K V, et al. An evaluation of naive Bayesian anti-spam filtering[ C ]//Proceedings of the 1 l th Euro- pean Conference on Machine Learning. Barcelona, Spain : Springer-Ver- lagi2000:9 - 17.
  • 4Can'eras X, Marquez L. Boosting trees for anti-spam e-mail filtering [ C ]//Proceedings of the 4th International Conference on Recent Ad- vances in Natural Language Processing,2001:58 -64.
  • 5Nicholas T. Using adaboost and decision stumps to identify spam e-mail [ R]. Stanford University,2003.
  • 6潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 7闫巧,冷成朝.基于信息增益的混合垃圾邮件特征选择方法[J].计算机工程与应用,2012,48(27):90-93. 被引量:1
  • 8周茜,赵明生,扈旻.中文文本分类中的特征选择研究[J].中文信息学报,2004,18(3):17-23. 被引量:165
  • 9王园,龚尚福.基于二次TF* IDF的互信息文本特征选择算法研究[J].计算机应用与软件,2011,28(4):129-131. 被引量:8
  • 10Yang Y M, Pedersen J O. A comparative study on feature selection in text categorization [ C ]. International Conference on Machine Learning Nashville Tennessee, USA, IMLS, 1997:412 - 420.

二级参考文献35

共引文献195

同被引文献13

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部