期刊文献+

基于特征词的垃圾短信分类器模型 被引量:11

Spam short message classifier model based on word terms
下载PDF
导出
摘要 针对垃圾短信分类问题,提出一种计算词分类权重的方法,并以此为基础通过降维来得到分类特征词集合。提出了短信分类隶属度概念,通过计算短信分类隶属度和分类隶属度密度的方法来实现分类。为了提高分类的准确性,还对特征词进行了分类权重的迭代学习,从而保证了词分类权重取值的合理性。实验结果表明,该分类模型具有良好的分类效果和较低的时间复杂度。 A classifier model based on word terms was proposed to classify Spam Short Messages (SSM). The concept of word-category weight was introduced for representing a word effect of identifying the category a SSM belongs to and a method was put forward to calculate the word-category weight. Based on the word-category weight, a dimension reduction was carried out to get word items set. The Short message-Category Membership Value (SCMV) was used to illustrate how much a SSM belonged to a category, then a classifying algorithm was implemented by computing SCMV and SCMV density. To improve the accuracy of classification and make the word-category weight more reasonable, an word-weight iterative learning procedure was performed. The experimental results show that the proposed model is superior to other classification methods in terms of classification performance and time complexity.
出处 《计算机应用》 CSCD 北大核心 2013年第5期1334-1337,共4页 journal of Computer Applications
基金 国家级星火计划项目(2011GA690190)
关键词 垃圾短信 特征词 文本分类 降维 权重学习 spam short message word term text classification dimensionality reduction weight learning
  • 相关文献

参考文献19

  • 1SALTON G, WANG A, YANG C S. A vector space model for auto- matic indexing [J]. Communication of the ACM, 1975, 18(5) :613 - 620.
  • 2LEWIS D D. Feature selection and feature extraction for text catego- rization [ C]//Proceedings of the Workshop on Speech and Natural Language. New York: Association for Computational Linguistics, 1992:212 -217.
  • 3李慧,叶鸿,潘学瑞,段震,张燕平.基于SVM的垃圾短信过滤系统[J].计算机安全,2012(6):34-38. 被引量:13
  • 4LAN M, TAN C L, SU J, et al. Supervised and traditional term weighting methods for automatic text categorization [ J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2009, 31(4): 721 -735.
  • 5张兢,候旭东,吕和胜.基于朴素贝叶斯和支持向量机的短信智能分析系统设计[J].重庆理工大学学报(自然科学),2010,24(1):77-80. 被引量:18
  • 6陈功平,沈明玉,王红,张燕平.基于内容的短信分类技术[J].华东理工大学学报(自然科学版),2011,37(6):770-774. 被引量:17
  • 7GANIZ M C. Higher order Naive Bayes: a novel non-IID approach to text classification[ J]. IEEE Transactions on Knowledge and Data Engineering, 2011,23 (7) : 1022 - 1034.
  • 8ZHANG H J. Textual and visual content-based anti-phishing: a Bayesian approach [ J]. IEEE Transactions on Neural Networks, 2011,22(10) : 1532 - 1546.
  • 9WONG T-L, LAM W. Learning to adapt Web information extractionknowledge and discovering new attributes via a Bayesian approach [ J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 22(4) : 523 - 536.
  • 10BELEM D. Content filtering for SMS systems based on Bayesian classifier and word grouping[ C]// LANOMS 2011: The 7th Net- work Operations and Management Symposium. Piscataway: IEEE Press, 2011:1 -7.

二级参考文献35

共引文献163

同被引文献83

  • 1钟延辉,傅彦,陈安龙,关娜.基于抽样的垃圾短信过滤方法[J].计算机应用研究,2009,26(3):933-935. 被引量:15
  • 2沈超,黄卫东.数据挖掘在垃圾短信过滤中的应用[J].电子科技大学学报,2009,38(S1):21-24. 被引量:6
  • 3李国栋,李卫.基于文本分类技术的垃圾邮件识别系统[J].微电子学与计算机,2004,21(6):145-146. 被引量:10
  • 4陈晓云,陈袆,王雷,李荣陆,胡运发.基于分类规则树的频繁模式文本分类[J].软件学报,2006,17(5):1017-1025. 被引量:19
  • 5陈晓云,胡运发.基于自适应加权的文本关联分类[J].小型微型计算机系统,2007,28(1):116-121. 被引量:6
  • 6Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Communication of the ACM,1975,18 (11):613-620.
  • 7Lewis D D.Feature selection and feature extraction for text categorization[C]//Proceedings of the workshop on Speech and Natural Language.Stroudsburg:Association for Computational Linguistics,1992:212-217.
  • 8Lan man,Tan C L,Su Jian,et al.Supervised and traditional term weighting methods for automatic text categorization[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2009,31(4):721-735.
  • 9Ganiz M C,George C,Pottenger W M.Higher order Na(1)ve Bayes:a novel non-IID approach to text classification[J].IEEE Transactions on Knowledge and Data Engineering,2011,23 (7):1022-1034.
  • 10Belem D,Duarte-Figueiredo F.Content filtering for SMS systems based on Bayesian classifier and word grouping[C]// 2011 7th Latin American Network Operations and Management Symposium (LANOMS),October 10-11,2011,Quito.New York:IEEE Xplore,2011:1-7.

引证文献11

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部