期刊文献+

一种优化的k-NN文本分类算法 被引量:2

Optimized k-NN Text Categorization Approach
下载PDF
导出
摘要 k-NN是经典的文本分类算法之一,在解决概念漂移问题上尤其具有优势,但其运行速度低下的缺点也非常严重,为此它通常借助特征选择降维方法来避免维度灾难、提高运行效率。但特征选择又会引起信息丢失等问题,不利于分类系统整体性能的提高。从文本向量的稀疏性特点出发,对传统的k-NN算法进行了诸多优化。优化算法简化了欧氏距离分类模型,大大降低了系统的运算开销,使运行效率有了质的提高。此外,优化算法还舍弃了特征选择预处理过程,从而可以完全避免因特征选择而引起的诸多不利问题,其分类性能也远远超出了普通k-NN。实验显示,优化算法在性能与效率双方面都有非常优秀的表现,它为传统的k-NN算法注入了新的活力,并可以在解决概念漂移等问题上发挥更大的作用。 As one of the most classical TC approaches,k-NN is advantaged in tackling concept drift. However, to avoid curse of dimensionality, it has to employ FS(feature selection) method to reduce dimensionality of feature space and improve operation efficiency. But FS process will generally cause information losing and thus has some side-effects on the whole performance of approach. According to sparsity of text vectors, an optimized k-NN approach was presented in paper. This optimized approach greatly simplified euclidean distance model and reduced the operation cost without any information losing. So it can simultaneously achieve much higher both performance and efficiency than general k-NN approach. It then enhanced the advantage of k-NN in managing concept drift.
出处 《计算机科学》 CSCD 北大核心 2009年第10期217-221,共5页 Computer Science
关键词 文本分类 特征选择 k-NN分类法 概念漂移 Text categorization,Feature seleetion,k-NN,Coneept drift
  • 相关文献

参考文献11

  • 1Widmer G K M. Learning in the presence of concept drift and hidden contexts[J]. Machine Learning,1996,23(1):69-101.
  • 2Fdez-riverola F,Iglesias E L,Me'ndez F D R,et al. Applying lazy learning algorithms to tackle concept drift in spam filtering [J]. Expert Systems with Applications,2007,33:36-48.
  • 3Witten I H, Frank E. Data Mining: Practical Machine Learning Tools and Techniques(2 ed) [M]. Beijing: China Machine Press, 2006.
  • 4Mitchell T M. Machine learning[M]. Beijing: China Machine Press, 2003 : 165-178.
  • 5Yang Y,Pedersen J. A comparative study on feature selection in text categorization[M]. San Francisco: Morgan Kaufrnann Publishers, 1997.
  • 6Delanya S J ,Cunninghamb P. An analysis of case-base editing in a spam filtering system[J]. Computer Science, 2004,3155:128- 141.
  • 7Stone T. Parameterization of naive bayes for spam filtering[R]. Masters comprehensive exam. University of Colorado at Boulder, 2003.
  • 8Tan P, Stenbach M, Kumar V. Introduction to data mining[M]. Beijing: People Posts & Telecom Press, 2006 :13-50.
  • 9Zorkadis V , Karras D A, Panayotou M. Efficient information theoretic strategies for classifier combination: feature extraction and performance evaluation in improving false positives and false negatives for spare e-mail filtering[J]. Neural Networks, 2005, 18: 799-807.
  • 10Delany S J, Cunningham P, Coyle L. An Assessment of case- based reasoning for spare filtering[J]. Artificial Intelligence Review. 2005,24 (3/4) : 359-378.

同被引文献10

引证文献2

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部