期刊文献+

一种改进的支持向量机邮件分类器 被引量:2

An Improved E-mail Classifier Based on Support Vector Machine
下载PDF
导出
摘要 在实际的邮件过滤应用中,由于垃圾邮件本身的一些因素,像传统的支持向量机分类模型把一个邮件样本明确地归为某一类就很容易出错,而以一定概率的输出判断是否属于某一类则较为合理。根据这种思想,本文在传统支持向量机邮件分类器基础上,提出了一种分类器优化思想,通过对分类输出进行概率计算,并对概率的阈值进行判断,从而确定邮件所属类别。实验证明这种方法是有效可行的。 In the real spare-filtering environment, because of the complicated factor of spam itself. It's easy to make mistakes just as the traditional support vector machine classifiers model doing-assigning an e-mail example to a class specifically. However, assigning an e-mail example to a class according to its probability output is a reasonable solution to the problem. According to the theory, we put forward an optimising idea based on the traditional SVM classification model. By computing the probability of output class, and judging the threshold of the probability, we can make sure which class the input email example belongs to. The experiment has proved that this method is efficient and feasible.
出处 《计算机科学》 CSCD 北大核心 2007年第9期90-92,共3页 Computer Science
基金 重庆市科委自然科学基金(基金号:CSTC2006BB2021)的资助
关键词 支持向量机 文本分类 邮件过滤 邮件分类器 Support vector machines, Text classification, E-mail filtering
  • 相关文献

参考文献11

  • 1Vapnik V.The Nature of Statistical Learning Theory.New York:Springer,1995.
  • 2潘文峰.[D].北京.中国科学院计算技术研究所,2004.7.
  • 3Kolcz A,Alspector J.SVM-based Filtering of E-mail Spam with Content-specific Misclassification Costs[A].In:Proc.ICDM22001 Workshop on Text Mining,2001.
  • 4Joachims T.Text categorization with support vector machines:Learning with many relevant features.In:Proceedings of the 10th European Conference on Machine Learning,1998.137-142.
  • 5Yang Y,Pedersen J.A comparative study on feature selection in text categorization.In:International Conference on Machine Learning (ICML),1997.
  • 6Cristianini N,Shawe-Taylor J.An Introduction to Support Vector Machines.Cambridge U K:Cambridge University Press,2000.
  • 7Platt J.Probabilistic outputs for support vector machines and comparison to regularized likelihood methods.In:Smola A,Bartlett P,Scholkopf B,et al,eds.Advances in Large Margin Classifiers.Cambridge,MA,2000.
  • 8Spambase.http://www.ics.uci.edu/- mlearn/MLRepository.html/,(2006-6-22).
  • 9Weka.http://www.cs.waikato.ac.nz/ml/weka/.
  • 10Drucker H,Wu D,Vapnik V N.Support Vector Machines for Spam Categorization[J].IEEE Transactions on Neural Networks,1999,(20)5:1048 - 1054.

共引文献21

同被引文献16

  • 1张基温,刘英戈,陈广良,董建设.基于Mobile Agent的协作式反垃圾邮件系统设计[J].计算机应用,2006,26(10):2338-2340. 被引量:1
  • 2CARRERAS X, MARQUEZ L. Boosting trees for anti-spam E-mail filtering [ C]// Proceedings of the 4th International Conference on Recent Advances in Natural Language Processing. Tzigov Chark, Bulgaria: [s.n.], 2001:58-64.
  • 3ANDROUTSOPOULOS I, PALIOURAS G, KARKALETSIS V, et al. Learning to filter spam E-mail: A comparison of a naive Bayes- ian and a memory-based approach [ C]//Proceedings of the 4th Eu- ropean Conference on Principles and Practice of Knowledge Discov- ery in Databases. Lyon, France: [ s. n. ], 2000:1-13.
  • 4ANDROUTSOPOULOS I, KOUTSIAS J, CHANDRINOS K, et al. An evaluation of nafve Bayesian anti-spam filtering [ C]// Proceed- ings of the 11 th European Conference on Machine Learning. Barce- lona, Spain: Is. n. ], 2000: 9- 17.
  • 5FU C, HUANG X, SCHUURMANS D, et al. Text classification in Asian languages without word segmentation [ C]//Proceedings of the 6th International Workshop on Information Retrieval with Asian Lan- guages. Sapporo, Japan: [s.n.], 2003: 44-48.
  • 6AMAYRI O, BOUGUILA N. A study of spam filtering using support vector machines [ J]. Artificial Intelligence Review, 2010, 34(1) : 73 - 108.
  • 7DRUCKER H, VAPNIK V, WU D. Support vector machines for spam categorization [ J]. IEEE Transactions on Neural Networks, 1999, 10(5) : 1048 - 1054.
  • 8TANTUG A, ERY1GIT G. Performance analysis of naive Bayes clas- sification, support vector machines and neural networks for spam categorization [ M]// Applied Soft Computing Technologies: The Challenge of Complexity. Berlin: Springer, 2006:495-504.
  • 9KOLCZ A, ALSPECTOR J. SVM-based filtering of E-mail spare with content specific misclassification costs [ C]//Proceedings of the 2001 Workshop on Text Mining. California: [ s. n. ], 2001:123 - 130.
  • 10SCULLEY D, WACHMAN G. Relaxed online SVMs for spam filte- ring [ C]// Proceedings of the 30th Annual International ACMSI- GIR Conference on Research and Development in Information Re- trieval. New York: ACM Press, 2007:415-422.

引证文献2

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部