期刊文献+

一种基于词汇相关度的网络文本分类算法研究

Research of,Web Text Classification Algorithm based on Lexical Relatedness
原文传递
导出
摘要 传统文本分类算法,在特征选择这一阶段,采用统计观点和方法机械处理词语与类别的联系,假定词语之间相互独立,忽略特征关键词之间的语义关系。本文提出一种新的特征选择方法,用基于上下文统计的词汇相关度方法,计算特征词之间的词汇相关度,设定相关度阀值,进行特征选择。降低了特征空间的高维稀疏性,并有效的减少噪声,提高了分类精度和算法效率。 Traditional text classification algorithms,on the stage of feature selection,use statistical point and methods handle the links between words and categories,and assume that words are independent,ignore the semantic relationships between keywords.This paper presents a new feature selection method,and use lexical relatedness based on the context of statistics,calculate the words’lexical relatedness and set the relevant threshold values for feature selection.Reduce the scarcity of high dimensional feature space,and effectively reduce noise,improve the classification accuracy and efficiency of the algorithm.
作者 邱前智 刘忠
机构地区 桂林理工大学
出处 《网络安全技术与应用》 2012年第5期33-34,40,共3页 Network Security Technology & Application
关键词 文本分类 特征选择 词汇相关度 Text Categorization Feature Selection Lexical Relatedness
  • 相关文献

参考文献4

二级参考文献22

  • 1王斌,潘文锋.基于内容的垃圾邮件过滤技术综述[J].中文信息学报,2005,19(5):1-10. 被引量:129
  • 2William W. Cohen. Fast effective rule induction[C]// Machine Learning Proceedings of the Twelfth International Conference on Machine Learning. Tahoe City, California, USA: Morgan Kaufmann, 1995: 115-123.
  • 3X. Carreras, L. Marquez. Boosting Trees for Anti Spam Email Filtering [C]//Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001). 2001: 58-64.
  • 4I. Androutsopoulos, G. Paliouras, V. Karkaletsis, etc, Learning to Filter Spam E-Mail: A Comparison of a Naive Bayesian and a Memory-Based Approach[C]// Proc. 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD 2000). 2000: 1-13.
  • 5H. Drueker, D. Wu, V. N. Vapnik, Support Vector Machines for Spam Categorization [ J/OL ]. IEEE Transactions on Neural Networks, 1999, 20 (5) : 1048-1054.
  • 6M. Sahami, S. Dumais, D. Heckerman etc, A Bayesian approach to filtering junk e-mail [C]//Proc. of AAAI Workshop on Learning for Text Categorization. 1998: 55-62.
  • 7Peat H J, Willet P. The limitations of term co-occurrence data for query expansion in document retrieval systems [J/OL]. JASIS, 1991, 42(5):378-383.
  • 8G Salton, A Wong, C S Yang. On the specification of term values in automatic indexing [J/OL]. Journal of Documentation, 1973, 29(4) :351-372.
  • 9Y. Yang. A Comparative Study on Feature Selection in Text Categorization [C]//Proceeding of the Fourteenth International Conference on Machine Learning (ICML'97) . 1997, 412-420.
  • 10Sebastiani F. Machine learning in automated text categorization [J].ACM Computing Surveys, 2002, 34 (1) : 1-47.

共引文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部