基于NB的双级分类模型在邮件过滤中的研究被引量：1

The Research of NB-based DLB Classification Anti-spam

下载PDF

导出

摘要使用朴素的贝叶斯(NB)分类模型对邮件进行分类,是目前基于内容的垃圾邮件过滤方法的研究热点。朴素的贝叶斯在参数之间联系不强的时候分类效果简单而有效。但是朴素的贝叶斯分类模型中对特征参数的条件独立假设无法表达参数之间在语义上的关系,影响分类性能。在朴素的贝叶斯分类模型的基础上,我们提出了一种双级贝叶斯分类模型(DLB,Double Level Bayes),既考虑到了参数之间的影响又保留了朴素的贝叶斯分类模型的优点。同时对DLB 模型与朴素的贝叶斯分类模型的性能进行比较。仿真实验表明,DLB 分类模型在垃圾邮件过滤应用中的效果在大部分条件下优于朴素的贝叶斯分类模型。 Classification method using Naive Bayesian（NB）classifier model which is the context-based spare filter method, is a hot point. The Naive Bayesian classifier is a simple and effective classification method, but its attribute independence assumption makes it unable to express its semantic dependence. A new classification model is proposed which we call Double Lever Bayes classifier model （DLB）. It considers not only the semantic dependence but also the simple and effective which is the excellence of NB classifier model. The performance is also compared between DLB and NIK The conclusion we get from experiment is that the performance using DLB classifier model is better than which using NB classifier model.

作者惠孛吴跃陈佳

机构地区电子科技大学

出处《计算机科学》 CSCD 北大核心 2006年第5期110-112,共3页 Computer Science

关键词垃圾邮件过虑朴素贝叶斯分类模型双级分类模型 Spam filter, Naive Bayesian classifier model, DLB model

分类号 TP311.56 [自动化与计算机技术—计算机软件与理论] TP393.098 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Huang Cecil,Darwiche A.Inference in Belief Networks:A Procedural Guide.Int Journal of Approximate Reasoning,1996,15:255～263
2Friedman N,Geiger D,Goldszmidt M.Bayesian Network Classifiers.Machine Learning,1997,29(2-3):131～163
3Yerazunis W S.The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past it.January 2004.Presented at the 2004 MIT Spam Conference
4PaulGraham.com.a Plan for spam.www.paulgraham.com/spam.html
5PaulGraham.com.Better Bayesian Filtering.www.paulgraham.com/better.html
6Androutsopoulos I,Paliouras G,Karkaletsis V,et al.Learning to Filter Spam E-Mail:A Comparison of a Naive Bayesian and a Memory-Based Approach.In:Proc.of the Workshop on Machine Learning and Textual Information Access,4th European Conf.Principles and Practice of Knowledge Discovery in Databases (PKDD 2000),Lyon,France,2000.1～13
7Hastie T,Tibshirani R,Friedman J[美]著.范明,柴玉梅等译.统计学学习基础-数据挖掘、推理与预测.北京:电子工业出版社,2004
8石洪波,王志海,黄厚宽,励晓健.一种限定性的双层贝叶斯分类模型[J].软件学报,2004,15(2):193-199. 被引量：44

二级参考文献15

1Friedman N,Geiger D,Goldszmidt M.Bayesian network classifiers.Machine Learning,1997,29(2-3):131-163.
2Langley P,Iba W,Thompson K.An analysis of Bayesian classifiers.In:Rosenbloom P,Szolovits P,eds.Proc.of the 10th National Conf.on Artificial Intelligence.Menlo Park:AAAI Press,1992.223-228.
3Kononenko I.Seminaive Bayesian classifier.In:Kodratoff Y,ed.Proc.of the 6th European Working Session on Learning.New York:Springer-Verlag,1991.206-219.
4Pazzani MJ.Searching for dependencies in Bayesian classifiers.In:Fisher D,Lenz HJ,eds.Learning from Data:Artificial Intelligence and Statistics V.New York:Springer-Verlag.1996.239-248.
5Langley P,Sage S.Induction of selective Bayesian classifiers.In:Mantaras RL,Poole DL,eds.Proc.of the 10th Conf.on Uncertainty in Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers,1994.399-406.
6Webb GI,Pazzani MJ.Adjusted probability naive Bayesian induction.In:Antoniou G,Slaney JK,eds.Proc.of the 11th Australian Joint Conf.on Artificial Intelligence.Berlin:Springer-Verlag,1998.285-295.
7Kohavi R.Scaling up the accuracy of Naive-Bayes classifiers:A decision-tree hybrid.In:Simoudis E,Han J,Fayyad UM,eds.Proc.of the 2nd Int'l Conf.on Knowledge Discovery and Data Mining.Menlo Park:AAAI Press,1996.202～207.
8Keogh EJ,Pazzani MJ.Learning augmented Bayesian classifiers:A comparison of distribution-based and classification-based approaches.In:Heckerman DE,Whittaker J,eds.Proc.of the Uncertainty'99:The 7th Int'l Workshop on Artificial Intelligence and Statistics.
9Cheng J,Greiner R.Comparing Bayesian network classifiers.In:Laskey KB,Prade H,eds.Proc.of the 15th Conf.on Uncertainty in Artificial Intelligence.San Francisco:Morgan Kaufmann Publishers,1999.101-108.
10Chickering DM,Geiger D,Heckerman D.Learning Bayesian networks is NP-complete.In:Fisher DH,Lenz HJ,eds.Learning from Data:Artificial Intelligence and Statistics V.New York:Springer-Verlag,1996.121-130.

共引文献43

1王利民,李雄飞,张海龙.基于广义信息论的贝叶斯分类器动态建模[J].吉林大学学报（工学版）,2009,39(3):776-780. 被引量：5
2周新栋,王挺.基于N元语言模型的文本分类方法[J].计算机应用,2005,25(1):11-13. 被引量：11
3张璠.多种策略改进朴素贝叶斯分类器[J].微机发展,2005,15(4):35-36. 被引量：11
4商琳,王金根,姚望舒,陈世福.一种基于多进化神经网络的分类方法[J].软件学报,2005,16(9):1577-1583. 被引量：13
5黄泽宇,卢润彩.急切式和懒惰式学习策略相结合的决策树分类模型[J].北京交通大学学报,2005,29(5):92-97.
6文桥,王卫平.基于改进贝叶斯算法的入侵检测方法[J].计算机工程,2006,32(12):160-162. 被引量：5
7房立,黄泽宇.竞争选择分裂属性的决策树分类模型[J].计算机技术与发展,2006,16(8):106-109.
8鲁明羽.Bayes文本分类器的改进方法研究[J].计算机工程,2006,32(17):63-65. 被引量：11
9王峻.一种基于强属性限定的贝叶斯分类模型[J].计算机技术与发展,2007,17(2):205-207. 被引量：1
10张树良.新的基于K3的个性化信息管理模式[J].情报理论与实践,2007,30(2):270-274.

同被引文献13

1成宝国,冯宏伟.一个基于Naive Bayesian垃圾邮件过滤器的改进[J].计算机技术与发展,2006,16(2):98-99. 被引量：3
2刘震,周明天.基于有监督Bayesian网络的垃圾邮件过滤[J].计算机应用,2006,26(3):558-561. 被引量：8
3赵治国,谭敏生,李志敏.基于改进贝叶斯的垃圾邮件过滤算法综述[J].南华大学学报（自然科学版）,2006,20(1):33-38. 被引量：4
4Gail C, Stephen G, John R. ARTMAP: Supervised Real-Time Learning and Classifieaion of Nonstationary Data by a Self-Organizing Neural Network. Neural Networks, 1991(4): 565-588.
5潘文峰.基于内容的垃圾邮件过滤研究.学位论文.中国科学院计算技术研究所,2004.
6Yang Yi Ming, Jan P. A Comparative Study on Feature Selection on Text Categorization// International Conference on Machine Learning (ICML). 1997:412-420.
7Zhan Chuan, Lu Xianliang, Hou Mengshu, et ak A LVQ- based neural network anti-spam email approach. ACM SIGOPS Operating Systems Review, 2004:35-39.
8James C , Irena K , Josiah P. A Neural Network Based Approach to Automated E-mail Classification//Proceedings of the 2003 IEEE/WIC/ACM international Conference on Web Intelligence(WI'03). 2003:702-705.
9Cheepeng L, Jennhwai L, Kuan MeiMing. A Hybrid Neural Network System for Pattern Classification. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2005, 27 (4) : 648-653.
10Dimitrios C, Michael G, Takis K. Classification of Noisy Signals Using Fuzzy ARTMAP Neural Networks. IEEE Transaction on Neural Networks, 2001, 12(5): 1023-1036.

引证文献1

1张鹏鹏,张自力.一种基于神经网络的垃圾邮件过滤方法[J].计算机科学,2008,35(5):190-193. 被引量：3

二级引证文献3

1王平,江华丽.嵌入式网络终端的神经网络邮件过滤技术[J].福建师范大学学报（自然科学版）,2009,25(5):45-49. 被引量：1
2王忠桃,彭鑫.基于机器学习的垃圾邮件过滤技术[J].中国科技信息,2010(6):67-68. 被引量：1
3方莹.基于改进的Nave Bayes和BP神经网络的垃圾邮件过滤[J].兰州理工大学学报,2011,37(2):98-101. 被引量：1

1杨际祥,王凡,谭国真,王荣生.一种并行BP交通流预测方法[J].小型微型计算机系统,2009,30(12):2453-2456. 被引量：2
2陈旋,刘健,冯新淇,赵雪美.基于朴素贝叶斯的差分隐私合成数据集发布算法[J].计算机科学,2015,42(1):236-238. 被引量：11
3彭慧.计算机网络实验室的升级改造研究[J].赤峰学院学报（自然科学版）,2015,31(19):52-53.
4城市公积金综合服务平台明年实现全覆盖[J].城乡建设,2016,0(2):4-4.
5房坚,王钺,袁坚.基于集合距离的信息优势度量方法[J].系统工程与电子技术,2017,39(1):114-119. 被引量：7
6孙杰,李莉,沈苏彬.一种基于QoS和动态负载均衡的路由策略[J].计算机技术与发展,2016,26(11):188-194. 被引量：4
7刘滨,石峰,高玉金.Dynamic Load Balancing Based on Restricted Multicast Tree in Homogeneous Multiprocessor Systems[J].Journal of Beijing Institute of Technology,2008,17(2):184-188. 被引量：1
8杨际祥,谭国真,王荣生,江德.并行分治计算中的一种Work-stealing策略[J].小型微型计算机系统,2010,31(3):408-412. 被引量：2
9刘卫东.中小学数字化校园建设策略探索[J].中国教育信息化（高教职教）,2015(10):72-73. 被引量：1
10王辉,陈泓予,杨姗姗.基于树加权朴素贝叶斯算法的入侵检测技术研究[J].计算机应用与软件,2016,33(2):294-298. 被引量：6

计算机科学

2006年第5期

浏览历史

内容加载中请稍等...

基于NB的双级分类模型在邮件过滤中的研究被引量：1

参考文献8

二级参考文献15

共引文献43

同被引文献13

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于NB的双级分类模型在邮件过滤中的研究 被引量：1

参考文献8

二级参考文献15

共引文献43

同被引文献13

引证文献1

二级引证文献3

相关作者

相关机构

相关主题

浏览历史

基于NB的双级分类模型在邮件过滤中的研究被引量：1