基于在线线性判别学习模型的垃圾邮件过滤方法

A Method of Spam Filtering Based on Online Linear Discriminative Learning Model

下载PDF

导出

摘要给出了一种使用在线线性判别学习模型进行垃圾邮件过滤的方法,使用贝叶斯理论进行特征提取,特征按出现的位置进行分类,不同类别的特征赋予不同的权重.在TREC测试集上进行了实验,并和TREC评测的结果进行了对比.实验结果表明,该方法取得了较好的结果. Spam filtering is an important task in the application of internet. In this paper a method of spam filtering based on online linear discriminative Learning Model is presented. We statically derive the features using Bayesian rule, clustering them into groups according to their position and then assigning weights respectively. The model is evaluated by TREC Spam corpus and compared with the TREC results. Experimental results show that our linear discriminative model can produce competitive results.

作者李军齐浩亮韩中元雷国华

机构地区黑龙江工程学院计算机科学与技术系

出处《哈尔滨理工大学学报》 CAS 2008年第3期48-50,共3页 Journal of Harbin University of Science and Technology

关键词垃圾邮件过滤判别学习模型特征提取贝叶斯理论主动学习 spam filtering discriminative learning model feature extraction bayesian theory active learning

分类号 TP393.08 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献6

1CORMACK G V, BRATKO A. Batch and on-line Spam Filter Evaluation [ C ]. Third Conference on Email and AntiSpam ( CEAS), California: Mountain View, 2006,27 - 28.
2SEBASTIANI F. Machine Learning in Automated Text Cate-gorization [ J ]. ACM Computing Surveys,2002,34 ( 1 ) : 1 - 47.
3LYNAM T R, ORMACK C G V. On-line Spam Filter Fusion[ C] // SIGIR 2006. Washington, USA. 2006:123 - 130.
4GOODMAN J, YIH W. Online Discriminative Spam Filter Training[C]// Third Conference on Email and AntiSpam (CEAS). California, USA : Mountain View, 2006,27 - 28.
5SCULLEY D, WACHMAN G M. Relaxed Online SVMs for Spam Filtering[ C] //SIGIR'07. 2007:415 -422.
6YERAZUNIS B. CRM114 Revealed-Or How I Learned To Stop Worrying and Trust My Automatic Monitoring Systems [ EB/OL] [2005 -03 -6]. This is the Complete CRM114 Manual Available for Free Download at http ://crm114. sourceforge. net. [ 2007 -10 -12].

1丁华福,王莹莹,韩咏,闵莉,邹钰.面向垃圾邮件过滤的典型机器学习算法比较研究[J].黑龙江工程学院学报,2012,26(2):65-69.
2李鹏,王斌,晋薇.一种基于社会化标签的信息检索方法[J].中文信息学报,2013,27(1):39-46. 被引量：3
3赵军,金千里,徐波.面向文本检索的语义计算[J].计算机学报,2005,28(12):2068-2078. 被引量：28
4冯是聪,王继民.关于“中文网页自动分类竞赛”结果的分析[J].中文信息学报,2003,17(5):34-40. 被引量：6
5韩中元,李生,齐浩亮,杨沐昀.面向信息检索的近邻语言模型[J].中文信息学报,2011,25(1):66-70. 被引量：2
6薛源海,俞晓明,刘悦,关峰,程学旗.信息检索中的带权邻近度度量研究[J].计算机研究与发展,2014,51(10):2216-2224. 被引量：1

哈尔滨理工大学学报

2008年第3期

浏览历史

内容加载中请稍等...

基于在线线性判别学习模型的垃圾邮件过滤方法

参考文献6

相关作者

相关机构

相关主题

浏览历史