期刊文献+

kNN算法在手机短信客户端分类中的应用研究 被引量:1

Research on the Application for kNN Algorithm at SMS Client Classification
下载PDF
导出
摘要 研究并实现了kNN算法的手机短信客户端分类系统,从自建的短信语料库中提取到正常短信和垃圾短信两个特征向量集,通过预处理、降维和去除词频过小的特征项,使特征向量集可最大程度的载有该类短信的特征项。短信语料库分成比对库和测试库两部分。研究发现,比对库的短信数量n取600时分类效果最好,过小则降低短信的识别率,过大则提升分类时间复杂度,近邻数k取25时效果最优。同时研究了当k条短信选取时的概率差在1%~2%时,短信类别确定时的数量差在5到15之间时,效果最优。遵循保证正常短信的通过率的同时加大垃圾短信识别率的原则,kNN算法手机短信客户端分类系统的最终参数n取600,k取25,概率差取1.5%,数量差取9,可使得正常短信和垃圾短信识别率最高达到97.3%和89%。 This paper studied and realized the SMS client classification system based on kNN algorithm and extracted two feature vectors set of the normal and spam SMS from the self-built SMS corpus, and made the feature vectors set get the feature item of the SMS to the maximum extent through the pretreatment, reducing dimension and removing the smaller frequency feature items. The study showed that the classification effect was the best when n was took 600,the SMS recognition rate reduced when n was too small, the classification time complexity enhanced when n too large, the optimum was neighbor number k to be took 25. At the meantime,the optimum effect was performed when the probability discrepancy of k SMS between 1%and 2%, and number discrepancy of which between 5 and 15. The recognition rate of normal and spam SMS was up to 97.3%and 89%when the final classification system parameter n was took 600, k was took 25,probability difference 1.5%,discrepancy number was took 9 to ensure the better normal SMS pass rate and spam SMS recognition rate.
出处 《山东农业大学学报(自然科学版)》 CSCD 北大核心 2014年第2期216-222,共7页 Journal of Shandong Agricultural University:Natural Science Edition
基金 安徽省高等学校省级自然科学研究项目(KJ2012B181) 安徽省高等学校省级自然科学研究项目(KJ2012B183)
关键词 短信分类 KNN算法 特征向量集 向量空间模型 SMS classification k-nearest neighbor algorithm feature vectors set vector space model
  • 相关文献

参考文献5

二级参考文献17

共引文献26

同被引文献16

  • 1PANGNING T,MICHAEL S,著.数据挖掘导论[M].范明、范宏建,译.北京:人民邮电出版社,2006:5.
  • 2DELANY S J,BUCKLEY M,GREENE D.SMS spam filtering:methods and data[J].Expert Systems with Applications,2012,39(10):9899-9908.
  • 3ALI K,MANGANARIS S,SRIKANT R.Partial classification using association rules[C]∥The 3th International Conference on Knowledge Discovery and Data Mining.Colifornia:American Association for Artificial Intelligence,1997:115-118.
  • 4LIU B,HSU W,MA Y M.Integrating classification and association rule mining[C]∥The 4th International Conference on Knowledge Discovery and Data Mining.New York:American Association for Artificial Intelligence,1998.
  • 5LI W,HAN J,PEI J.CMAR:accurate and efficient classification based on multiple class-association rules[C]∥Data Mining,2001.Proceedings IEEE International Conference on.California:IEEE,2001:369-376.
  • 6YIN X,HAN J.CPAR:classification based on predictive association rules[C]∥SIAM International Conference on Data Mining.San Francisco:Army High Performance Computing Research Center and University of Illinois,2003:331-335.
  • 7DONG G,ZHANG X,WONG L,et al.CAEP:Classification by aggregating emerging patterns[C]∥Discovery Science.Berlin:Springer,1999:30-42.
  • 8ZAIANE O R,ANTONIE M L.Classifying text documents by associating terms with text categories[C]∥Australian Computer Science communications.Melbourne:Australian Computer Society,Inc.,2002,24(2):215-222.
  • 9HAN J,KAMBER M.Data mining:concepts and techniques,2000[J].Data Mining Concepts Models Methods&Algorithms Second Edition,2000,26(1):1-18.
  • 10武建华,沈钧毅,方加沛.提取有效规则的关联分类算法[J].西安交通大学学报,2009,43(4):22-25. 被引量:6

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部