期刊文献+

垃圾邮件过滤中特征选择方法研究 被引量:2

Research on the feature selection method for spam filtering
下载PDF
导出
摘要 文章对垃圾邮件过滤中的特征选择问题进行了研究,引入"词共现模型"考虑词语之间的语义联系信息,和传统的信息增益特征选择方法结合表示邮件,采用神经网络方法对邮件进行分类得到垃圾邮件过滤器。实验表明,文章提出的将词共现对和信息增益结合的特征选择方法能够提高垃圾邮件过滤的精确度。 Feature selection for spam filtering is researched in this paper. The word co-occurrence model is introduced to analyze the semantic relation between phrases. Features representing emails are selected by word co-occurrence and information gain. The neural network is used to classify emails and construct the spare filter. The experiments show that the precision of spare filtering is increased by feature selection which combines word co-occurrence and information gain.
出处 《合肥工业大学学报(自然科学版)》 CAS CSCD 北大核心 2009年第12期1863-1866,共4页 Journal of Hefei University of Technology:Natural Science
关键词 垃圾邮件过滤 信息增益 词共现模型 神经网络 交叉覆盖算法 spare filtering information gain word co-occurrence model neural network crossover algorithm
  • 相关文献

参考文献11

  • 1Nicholas T. Using AdaBoost and decision stumps to identify spam e-mailER/OLd. Stanford University Course Project (Spring 2002/2003) Report, 2003. http://nlp, stanford. edu/courses/es224n/2003/fp/. 2003-06-07.
  • 2Drucker H,Vapnik V N. Support vector machines for spam categorization [ J ]. IEEE Transactions on Neural Networks, 1999, 20(5) :1048--1054.
  • 3Carreras X,Marquez L. Boosting trees for anti spam email filtering[C]//Proeeedings of Euro Conference Recent Advances in NLP (RANLP-2001), 2001: 58-64.
  • 4刘洋,杜孝平,罗平,等.垃圾邮件的智能分析、过滤及rough集讨论[C].第十二届中国计算机学会网络与数据通信学术会议,武汉,2002.
  • 5Sahami M, Dumais S, Heckerman D, et al. A Bayesian approach to filtering junk e-mail[C]//Proc of AAAI Workshop on Learning for Text Categorization, 1998:55--62.
  • 6Androutsopoulos I, Paliouras G,Karkaletsis V, et al. Learning to filter spare e-mail: a comparison of a naive Bayesian and a memory-based approach [C]//Proc 4th European Conference on Principles and Practice of Knowledge Diseov cry in Databases (PKDD 2000),2000:1-13.
  • 7Salton G. On the specification of term values in automatic indexing [ J ]. Journal of Documentation, 1973, 29 ( 4 ) : 351--372.
  • 8Yang Y. A comparative study on feature selection in text categofization[C]//Proceeding of the Fourteenth International Conference on Machine Learning ( ICMI. ' 97 ), 1997 : 412--420.
  • 9Peat H J, Willet P. The limitations of term data for query expansion in document retrieval systems[J]. Journal of the American Society for Information Science, 1991,42(5) :378--383.
  • 10张铃,张钹.M-P神经元模型的几何意义及其应用[J].软件学报,1998,9(5):334-338. 被引量:135

二级参考文献33

  • 1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量:20
  • 2张铃,张钹.多层反馈神经网络的FP学习和综合算法[J].软件学报,1997,8(4):252-258. 被引量:24
  • 3M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
  • 4N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
  • 5R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
  • 6M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
  • 7W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
  • 8W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
  • 9X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
  • 10T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.

共引文献262

同被引文献17

  • 1中国互联网协会反垃圾邮件中心.2009年第四季度中国反垃圾邮件状况调查报告[EB/OL].(2010-04-23)[2010-05-27].http://www.12321.cn/viewnews.php?id=12679.
  • 2SALTON G, LESK M E. Computer evaluation of indexing and text processing[J].Journal of the ACM,1968,15(1):8-36.
  • 3LANGLEY P, LBA W, THOMPSON K. An analysis of Bayesian classifiers[C]//Proc of the 10th National Conference on Artificial Intelligence. Menlo Park: AAAI Press and MIT Press, 1992: 223-228.
  • 4刘洋,杜孝平,周二胜,等. 垃圾邮件的智能分析、过滤及Rough 集讨论[C]//第十二届中国计算机学会网络与数据通信学术会议论文集.2002: 515-521.
  • 5赵利,廖闻剑,彭艳兵. 基于中文主题的垃圾邮件过滤方法研究[C]//中国通信学会第六届学术年会论文集(上).2009:16-19.
  • 6KOSMOPOULOS A, PALIOURAS G, ANDROUTSOPOULOS I. Adaptive spam filtering using only Nave Bayes text classifiers[C]//Proc of the 5th Conference on E-mail and Anti-Spam.2008.
  • 7Cancedda N, Gaussier E, Goutte C, et al. Word-se- quence kernels[J]. Journal of Machine Learning Re- search, 2003,3 : 1059-1082.
  • 8Cristianini N, Shawe-Taylor J. Support Vector Ma- chines[M]. Cambridge: Cambridge University Press, 2000.
  • 9Lodhi H, Saunders C, Shawe-Taylor J,et al. Text clas-sification using string kernels[J]. Journal of Machine Learning Research ,2002,2:419-444.
  • 10Joachims T. Making large-scale SVM learning practi- cal [C]//Advances in Kernel Methods Support Vector Learning. Boston:MIT Press, 1999.

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部