垃圾邮件处理中LDA特征选择方法被引量：1

LDA based feature selection for spam filter

下载PDF

导出

摘要垃圾邮件处理是一项长期研究课题,越来越多的文本分类技术被移植到垃圾邮件处理应用当中。LDA(Latent Dirichlet Allocation)等topic模型在自动摘要、信息获取和其他离散数据应用中受到越来越多的关注。将LDA模型作为一种特征选择方法,引入垃圾邮件处理应用中。将LDA特征选择方法与质心+KNN分类器结合,得到简单的测试用垃圾邮件过滤器。初步实验结果表明,基于LDA的特征选择方法优于通常的IG、MI特征选择方法;测试过滤器的过滤性能与其他过滤器相当。 Spam filtering is a long-drawn research issue.More and more text categorization techniques are replanted for spam filtering.Latent Dirichlet Allocation（LDA） and other related topic models are increasingly popular tools for summarization,manifold discovery and other application in discrete data.The LDA is introduced into spam filtering as a feature selection tool.Combined the LDA with a simple centroid-based ＋ kNN classifier,a test spam filter is got.The experiment result shows that the features selected by LDA outperform the baseline features selected by IG and MI, and the test filter is comparative to other filters.

作者袁伯秋周一民李林

机构地区北京航天航空大学计算机学院

出处《计算机工程与应用》 CSCD 北大核心 2009年第25期121-124,共4页 Computer Engineering and Applications

关键词垃圾邮件过滤一种话题模型(LDA) 特征选择 spam filter Latent Dirichlet Allocation（LDA） feature selection

分类号 TP39 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献17

1Androutsopoulos,Koutsias J,Chandrinos K V,et al.An evaluation of naive bayesian anti-spare filtering[C]//Proceedings of the Workshop on Machine Learning in the New Information Age, 2000.
2Dasgupta A,Drineas P,Harb B,et al.Feature selection methods for text classification[C]//KDD'07 Research Track Papers.ACM Press, 2007 : 230-239.
3Forman G.An extensive empirical study of feature selection metrics for text elassification[J].Journal of Machine Learning Research, 2003,3 : 1289-1305.
4SchUtze H,Hull D A,Pedersen J O.A comparison of classifiers and document representations for routing problem[C]//18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR'95), 1995:229-237.
5Yang Y,Pedersen J O.A comparative study on feature selection in text categorization[C]//Proceedings of ICML-97,14th International Conference on Machine Learning,Nashville,US.San Francisco: Morgan Kaufmann Publishers, 1997 : 412-420.
6Jiang Wei,Guan Yi,Wang Xiaolong.Improving feature extraction in named entity recognition based on maximum entropy modd[C]//The 2006 International Conference on Machine Learning and Cybernetics(ICMLC2006 ), China, 2006: 2630-2635.
7Lewis D D,Ringuette M.Comparison of two learning algorithms for text categorization[C]//Proceedings of the 3nd Annual Symposium on Document Analysis and Information Retrieval(SDAIR'94),1994.
8Kevin R G.Using latent semantic indexing to filter spam[C]//ACM Symposium on Applied Computing, Data Mining Trace,2003:460- 464.
9Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation[J].Journal of Machine Learning Research, 2003,3 : 993-1022.
10Martin S,Sewani A,Nelson B,et al.Analyzing behavioral features for email classification[C]//Proceedings of the Conference on Email and Anti-spam(CEAS),2005.

同被引文献13

1Ko C, Fink G, Levitt K. Automated detection of vulnerabilities in privi- ledged programs by execution monitoring [ C ]//Proceedings of the 10^th Annual Computers Security Applications Conference. 1994:134-144.
2Hofmeyr S A, Somayaji A, Forrest S. Intrusion Detection System Using Sequences of System Calls [ J]. Journal of Computer Security, 1998,6 (3) : 151 -180.
3Warrender C, Forrest S, Pearlmutter B. Detecting intrusions using sys- tem calls: alternative data models[ C]//Proceedings of 1999 IEEE Symposium on Security and Privacy, 1999:133 -145.
4Lee W, Stolfo S J. Data mining approaches for intrusion detection [ C ]//Proceedings of the seventh USENIX Security Symposium, 1998:6.
5Lee W, Stolfo S J, Chan P K. Learning patterns from Unix process ex- ecution traces for intrusion detection [ C ]//Proceedings of AAAI97 Workshop on AI Methods in Fraud and Risk Management, 1997:50 -56.
6Forrest S, Hofmeyr S A, Somayaji A, et al. A sense of self for UNIX processes[ C ]//Proceedings of the 1996 IEEE Symposium on Security and Privacy, 1996 : 120 - 128.
7Liao Y, Vemuri V R. Use of k - nearest neighbor classifier for intrusion detection [ J ]. Computers Security, 2002,21 (5) :439 - 448.
8Blei D M,Ng A Y, Jordan M I. Latent Dirichlet Allocation[J]. Jour- nal of Machine Learning Research, 2003 (3) :993 -1022.
9Griffiths T L, Steyvers M. Finding scientific topics[ C ]//Proceedings of the National Academy of Sciences of the United States of America 101, 2004:5228 - 5235 .
10Barreno M, Nelson B, Sears R, et al. User model tranfffer for E-mail vi- res detection [ C 1//First Workshop on Tackling Computer System Prob- lems witll Machine Learning Techniques( SysML), 2006.

引证文献1

1贺喜,蒋建春,丁丽萍,王永吉,廖晓峰.基于LDA模型的主机异常检测方法[J].计算机应用与软件,2012,29(8):1-4. 被引量：5

二级引证文献5

1朱韶平.基于LDA模型的滚动轴承故障类型检测[J].轴承,2014(7):42-46. 被引量：1
2王少鹏,彭岩,王洁.基于LDA的文本聚类在网络舆情分析中的应用研究[J].山东大学学报（理学版）,2014,49(9):129-134. 被引量：28
3赵刚,宋健豪.基于系统调用时间特征的异常行为智能检测系统[J].计算机应用与软件,2015,32(4):309-313. 被引量：4
4闫丽景,单征,贾珣,陈鹏.基于行为轨迹的软件动态可信度量[J].计算机应用研究,2017,34(2):539-542. 被引量：4
5雷甜,罗建宏.社会化媒体下公共政策网络传播机制实证研究——以浙江省“五水共治”为例[J].情报探索,2017(6):1-7. 被引量：3

1马世军,姚建,乔文.基于贝叶斯理论的垃圾邮件过滤技术[J].硅谷,2009,2(13).
2张铭锋,李云春,李巍.垃圾邮件过滤的贝叶斯方法综述[J].计算机应用研究,2005,22(8):14-19. 被引量：23
3刘洋,曹津宁,刘昊,秦玉平.基于贝叶斯方法的垃圾邮件处理模型研究[J].长春工程学院学报（自然科学版）,2007,8(3):75-76.
4何培舟,何鹏,温向明.基于贝叶斯算法的垃圾邮件过滤研究[J].微计算机应用,2008,29(8):7-10.
5JanDeClercq,蒋世滨.Exchange Server 2003的安全特性——新的垃圾邮件处理、病毒防护、Web访问安全性功能[J].Windows & Net Magazine（国际中文版）,2004(02M):60-66.
6胡可,张家树.基于人工免疫系统的反垃圾邮件过滤机制[J].计算机应用,2005,25(11):2559-2561. 被引量：6
7孟兆玲,赵轶群.基于贝叶斯理论的垃圾邮件过滤技术综述[J].现代计算机,2007,13(11):16-19. 被引量：1
8李雯,刘培玉.基于贝叶斯的垃圾邮件过滤算法的研究[J].计算机工程与应用,2007,43(23):174-176. 被引量：14
9春露.搞忘邮件主题不再尴尬[J].电脑迷,2009(13):72-72.
10陈健,唐彦儒.关于网络工程中的安全防护技术的思考[J].价值工程,2015,34(15):244-245. 被引量：20

计算机工程与应用

2009年第25期

浏览历史

内容加载中请稍等...

垃圾邮件处理中LDA特征选择方法被引量：1

参考文献17

同被引文献13

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

垃圾邮件处理中LDA特征选择方法 被引量：1

参考文献17

同被引文献13

引证文献1

二级引证文献5

相关作者

相关机构

相关主题

浏览历史

垃圾邮件处理中LDA特征选择方法被引量：1