一种基于主动贝叶斯分类技术的垃圾邮件过滤方法被引量：1

A spam filtering method based on active Bayesian classification technology

下载PDF

导出

摘要目前,将机器学习、文本分类与信息过滤技术相结合的过滤方法成为研究热点。对实际邮件过滤时往往会遇到训练样本中包含大量未带类别标注的邮件,应用传统分类方法存在耗时且过滤性能差等问题,文章提出采用主动贝叶斯分类方法RANB对训练样本进行预处理,以标识其多类别;实验表明,这种方法可有效地提高训练样本质量,提高过滤器性能,在各项评价指标上具备优越性。 Current estimates indicate that nearly sixty percent of email traffic is regarded as spam and there is little reason to expect this to continue. Machine learning, text categorization and information filter can be effectively used to solve the problem. The proposed state-of the-art classification methods often label their classes firstly when there are a large number of unlabeled emails, which brings up heavy overhead of time and decreases the classification accuracy. Therefore. an active Bayesian classification technology RANB is proposed in this paper, which is used to label the classes of the unlabeled training emails as pretreatment. The experimental study shows that under the conditions of ensuring the capability of the filter in comparison with the classical methods, the method could effectively im- prove the quality of training samples and has better performance according to the appraisal standard.

作者李笛张玉红胡学钢

机构地区合肥工业大学计算机与信息学院

出处《合肥工业大学学报（自然科学版）》 CAS CSCD 北大核心 2008年第9期1443-1446,共4页 Journal of Hefei University of Technology：Natural Science

基金安徽省自然科学基金资助项目(050420207)

关键词垃圾邮件机器学习文本分类信息过滤主动学习贝叶斯分类 spam machine learning text categorization information filter active learning naive Bayes classification

分类号 TP181 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献11

1中国互联网协会反垃圾邮件中心.2006年第四次中国反垃圾邮件状况调查报告[EB/OL].http://www.anti-spam.eft2007-09-21.
2Androutsopoulos I, Paliouras G, Karkaletsis V, et al. Learning to filter spam e-mail: a comparison of a naive Bayesian and a memory based approach[C]//Proc 4th Euro2pean Conference on Principles and Practice of Knowl- edge Discovery in Databases (PKDD 2000), 2000:1-13.
3Carreras X, Marquez L. Boosting trees for anti-spare email filtering[C]//Proceedings of Euro Conference Recent Ad vances in NLP (RANLP22001), 2001 : 58-64.
4Drucker H, Wu D, Vapnik V N. Support vector machines for spare categorization [J].IEEE Transactionson Neural Networks, 1999,20(5) : 1048-1054.
5Ji Shihao, Krishnapuram B, Carin L. Variational Bayes for continuous hidden Markov models and its application to active learning [J]. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 2006,28(4) : 522-532.
6赵悦,穆志纯.基于委员会投票选择方法的主动学习的研究[J].太原理工大学学报,2006,37(4):469-472. 被引量：7
7宫秀军,孙建平,史忠植.主动贝叶斯网络分类器[J].计算机研究与发展,2002,39(5):574-579. 被引量：37
8LiuTao, Liu Shengping,Chen Zheng, et al. An evaluation on feature seleetion for text clustering[C]//Proceedings of the 20 th International Conference on Machine Learning (ICML-03),2003:488-495.
9Yu Lei, Liu Huan. Feature selection for high dimensional data: a fast correlation based filter solution[C]//Proceedings of the 20 th International Conference on Machine Learning (ICML-03), 2003:856-863.
10王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量：129

二级参考文献43

1李渝勤,孙丽华.基于规则的自动分类在文本分类中的应用[J].中文信息学报,2004,18(4):9-14. 被引量：20
2史忠植.知识发现[M].北京:清华大学出版社,2000..
3M. DeSouza, J. Fitzgerald, C. Kempand G. Truong, A Decision Tree based Spam Filtering Agent[EB] . from http:∥www. cs. mu. oz. au/481/2001- projects/gntr/index. html, 2001.
4N. Littlestone, Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm[J]. Machine Learning, 2(4) :285- 318, 1988[J].
5R. Krishnamurthy and C. Orasan, A corpus-based investigation of junk emails[A]. In: Proceedings of Language Resources and Evaluation Conference (LREC 2002)[C]. Las Palmas de Gran Canaria, Spain, pp. 1773- 1780,May 2002.
6M. Sahami, S. Dumais, D. Heckerman and E. Horvitz, A Bayesian approach to filtering junk e-mail[A]. In:Proc. of AAAI Workshop on Learning for Text Categorization[C]. pp. 55-62, 1998.
7W. Cohen, Fast effective rule induction[A]. In: Machine Learning Proceedings of the Twelfth International Conference[C]. Lake Taho, California, Mongan Kanfmann, pp. 115-123, 1995.
8W. Cohen, Learning rules that classify email[A]. In: Proceedings of the AAAI spring symposium of Machine Learning in Information Access, Palo Alto[C]. California, pp. 18 - 25. 1996.
9X. Carreras and L. Marquez, Boosting Trees for Anti-Spam Email Filtering[A]. In: Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001)[C]. pp. 58-64, Sep. 2001.
10T. Nicholas, Using AdaBoost and Decision Stumps to Identify Spam E-mail[ EB]. Stanford University Course Project (Spring 2002/2003) Report, from http: ∥nlp. stanford. edu/courses/cs224n/2003/fp/.

共引文献169

1张登科,易秀双,王兴伟.一种基于相似度测量的新垃圾邮件发现机制[J].中国海洋大学学报（自然科学版）,2008,38(S1):147-150. 被引量：1
2刘洋,曹津宁,刘昊,秦玉平.基于贝叶斯方法的垃圾邮件处理模型研究[J].长春工程学院学报（自然科学版）,2007,8(3):75-76.
3王利民,李雄飞,张海龙.基于广义信息论的贝叶斯分类器动态建模[J].吉林大学学报（工学版）,2009,39(3):776-780. 被引量：5
4李笛,胡学钢,胡春玲.主动贝叶斯分类方法研究[J].计算机研究与发展,2007,44(z2):47-51. 被引量：1
5李仪,蔡自兴.基于贝叶斯分类器的移动机器人避障[J].控制工程,2004,11(4):332-334. 被引量：4
6张平.追求[J].就业与保障,2005(11):1-1.
7刘丽珍,宋瀚涛,陆玉昌.无标记训练样本的Web文本分类方法[J].计算机科学,2006,33(3):200-201. 被引量：2
8王金宝.基于增量学习和阈值优化的自适应信息过滤研究[J].计算机应用,2006,26(5):1099-1101.
9谷峰,吴扬扬.文本分类关键技术[J].福建电脑,2006,22(9):5-6. 被引量：2
10庄锁法,陈兴梅.客户端防范垃圾邮件策略的探讨[J].电脑知识与技术,2006(8):172-172.

同被引文献14

1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量：129
2兰亚,吴渝,王国胤,董蓓.基于滑动窗口的优化贝叶斯邮件过滤算法[J].重庆邮电学院学报（自然科学版）,2006,18(4):528-531. 被引量：4
3李洋,方滨兴,郭莉,田志宏.基于主动学习和TCM-KNN方法的有指导入侵检测技术[J].计算机学报,2007,30(8):1464-1473. 被引量：31
4中国互联网协会反垃圾邮件中心.2008年第三次中国反垃圾邮件状况调查报告[EB/OL].(2008-10-28)[2010-01-17].http://www.anti-spam.cn/pdf/2008_03_dc.pdf.
5ANDROUTSOPOULOS I,PALIOURAS G,KARKALETSIS V,et a1.Learning to filter spam e-mail:a comparison of a nave Bayesian and a memory based approach[C] //ZIGHED Djamel A,KOMOROWSKI Jan,ZYTKOW Jan.Proc 4th European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD 2000).Lyon,France:Springer,2000:1-13.
6CARRERAS X,MARQUEZ L.Boosting trees for anti-spam email filtering[C] //MITKOV Ruslan.Proceedings of Euro Conference Recent Advances in NLP (RANLP2001).Tzigov Chark,Bulgaria:Johu Benjamins Publishing Co,2001:58-64.
7SAHAMI M,DUMAIS S,HECKERMAN D,et a1.A Bayesian approach to filtering junk e-mail[C] //MOSTOW Jack,RICH Charles.Proc of AAAI Workshop on Learning for Text Categorization.Madison,Wisconsin:Springer,1998:55-62.
8YI Y,LI C,SONG W.Email classification Using Semantic FeatureSpace[C] //SUN Maosong.2008 International Conference on Advanced Language Processing and Web Information Technology.Liaoning,China:Computer Society Press,2008:32-37.
9TONG B,QIN Z,MA X,et al.Som Classification Method Based On Transduction Scheme[C] //IEEE.International Conference on Apperceiving Computing and Intelligence Analysis 2008(ICACIA′08).Chengdu,China:IEEE,2008:12-15.
10WANG Lei,KHAN Latifur,THURAISINGHAM Bhavani.An Effective Evidence Theory based K-nearest Neighbor (KNN)classification[C] //IEEE.International Conference on Web Intelligence and Intelligent Agent Technology.Sydney,Australia:IEEE,2008:797-801.

引证文献1

1董振兴,李荣,陈龙.一种基于主动学习和TCM-EKNN的邮件过滤方法[J].重庆邮电大学学报（自然科学版）,2011,23(1):85-90.

1任永坤.基于数据驱动技术的故障诊断和预测方法探究[J].青年时代,2016,0(2):64-64.
2阳小兰,钱程.基于贝叶斯分类器的垃圾邮件过滤的研究与改进[J].计算机与数字工程,2011,39(4):111-114.
3徐娟,张超,黄大卫,吴小培,王营冠.基于运动目标分类的监控视频检索系统[J].工业控制计算机,2015,28(7):115-116.
4王与,刘洋.分类技术在高校教学管理中的应用[J].滁州学院学报,2011,13(5):124-125.
5杨帆,张彩丽.基于PCA和贝叶斯分类技术的滚动轴承质量检测方法[J].陕西科技大学学报（自然科学版）,2007,25(5):105-108. 被引量：2
6孙笑微.贝叶斯分类技术在高校教师教学质量评价中的应用[J].沈阳师范大学学报（自然科学版）,2014,32(1):98-102. 被引量：9
7李杰,王小伟.基于作者主题模型的遥感图像自动类别标注方法[J].计算机应用与软件,2013,30(10):263-265. 被引量：3
8缪广寒.基于叶贝斯分类的个性化学习风格研究[J].电子技术与软件工程,2014(20):219-220.
9何明波,谭政,宋迪,刘真祥.基于贝叶斯技术的P2P流量识别方法的研究[J].计算机与现代化,2009(11):67-69.
10屈军.基于增量的贝叶斯算法在网页文本中的应用[J].赤峰学院学报（自然科学版）,2013,29(13):23-24.

合肥工业大学学报（自然科学版）

2008年第9期

浏览历史

内容加载中请稍等...

一种基于主动贝叶斯分类技术的垃圾邮件过滤方法被引量：1

参考文献11

二级参考文献43

共引文献169

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于主动贝叶斯分类技术的垃圾邮件过滤方法 被引量：1

参考文献11

二级参考文献43

共引文献169

同被引文献14

引证文献1

相关作者

相关机构

相关主题

浏览历史

一种基于主动贝叶斯分类技术的垃圾邮件过滤方法被引量：1