期刊文献+

基于通用后缀树模型的垃圾邮件过滤方法

Method of Spam Filtering Based on General Suffix Tree Model
下载PDF
导出
摘要 采用通用后缀树模型(GSTM),利用邮件内容的上下文信息,进行每个文本位置的不定长多元统计,从而获得被测邮件与不同训练集的相似程度,确定邮件所属的类别。理论分析和实验表明,在相同语料上,该方法的精确度和召回率均达到或超过了基于向量空间模型的邮件过滤方法;对于长度为N的邮件,过滤时间为O(N);长度为N的新邮件加入训练集,训练时间为O(N),满足了训练集的动态增长;该方法不需进行分词处理,完全独立于语种,适用于多语种邮件同时存在的情况。 The paper proposes a method of spam filtering based on content. It adopts general suffix tree model(GSTM), takes advantage of context location, and does string match of unfixed length, then computes the similarity between test mail and the corpus to determine the sort of E-mail. The experiments and analyses prove that the method is better than other methods based on vector space model(VSM) in both accuracy and recall when tested on the same corpus. The avoidance of word segmentation shows that the categorizing process is irrelevant with the concrete language and is a language independent method.
出处 《计算机工程》 CAS CSCD 北大核心 2007年第9期100-102,共3页 Computer Engineering
关键词 文本分类 垃圾邮件 通用后缀树 Text classify Spam General suffix tree
  • 相关文献

参考文献6

二级参考文献32

  • 1王映,常毅,谭建龙,白硕.基于N元汉字串模型的文本表示和实时分类的研究与实现[J].计算机工程与应用,2005,41(5):88-91. 被引量:5
  • 2[1]Freund, Y., Schapire, R. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 1997,55(1):119~139.
  • 3[2]Breiman, L., Friedman, J., Olshen, R., et al. Classification and Regression Trees. Belmont, CA: Wadsworth, 1984. 1~357.
  • 4[3]Schapire, R., Singer, Y. BoosTexter: a boosting-based system for text categorization. Machine Learning, 2000,39(2/3):135~168.
  • 5[4]Salton, G., Wong, A., Yang, C. A vector space model for automatic indexing. Communications of the ACM, 1995,18:613~620.
  • 6[5]Schapire, R., Singer, Y. Improved boosting algorithms using confidence-related predictions. Machine Learning, 1999,37(3): 297~336.
  • 7[1]Vapnik V. The Nature of Statistical Learning Theory. New York: Springer-Verlag, 1995.
  • 8[2]Stitson MO, Weston JAE, Gammerman A, Vovk V, Vapnik V. Theory of support vector machines. Technical Report, CSD-TR-96-17, Computational Intelligence Group, Royal Holloway: University of London, 1996.
  • 9[3]Cortes C, Vapnik V. Support vector networks. Machine Learning, 1995,20:273~297.
  • 10[4]Vapnik V. Statistical Learning Theory. John Wiley and Sons, 1998.

共引文献112

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部