期刊文献+

一种快速高效的文本分类方法 被引量:15

An Effective and Efficient Algorithm for Text Categorization
下载PDF
导出
摘要 论文讨论了两个常用的文本分类算法:向量空间法和k近邻方法。前者速度快,但是分类精度通常不能令人满意。后者则相反,它在分类时要花费更多的时间,但分类效果要好很多。通过综合它们的优点提出了一个新的文本分类算法:向量空间法和k近邻的组合方法。试验表明,新算法能在较少的时间复杂度上达到甚至超过k近邻的分类效果。 This paper discusses two popular algorithms for text categorization:Vector Space ModeI(VSM) and k Nearest Neighbor(kNN).The former is a simple and fast algorithm,but its precision is often not satisfying.On the contrary,the latter spends much time determining the class label of a query document,but often gains better categorization performance.We have proposed a new algorithm,hybrid of VSM and kNN,by combining the strength of these two algorithms.We have performed an experimental evaluation of the effectiveness of this algorithm.The result of experiment demonstrates that the new algorithm achieves a competitive(or even better) performance to the well-known algorithm kNN at the cost of much less computation.
出处 《计算机工程与应用》 CSCD 北大核心 2005年第29期180-183,共4页 Computer Engineering and Applications
关键词 文本分类 向量空间法 K近邻 text categorization, VSM, kNN
  • 相关文献

参考文献13

  • 1S Arya,D Mount,N Netanyahu et al.An optimal algorithm far approximate nearest neighbor searching fixed dimensions[J].Journal of the ACM, 1998 ;45(6) :891~923.
  • 2Craven M,Dipasquo A,Freitag A et al.Learning to extract symbolic knowledge from the World Wide Web[C].In:Proc of the Fifteenth National Conf.on Artificial Intelligence (AAAI-98),Wisconsin,1998: 509~516.
  • 3A Gionis,P Indyk,R Motwani.Similarity Search in High Dimensions via Hashing[J].The {VLDB} Journal,1999:518-529.
  • 4Lang K.News Weeder: Learning to filter net news[C].In:Int Conf on Machine Learning (ICML), California, 1995 :331-339.
  • 5Lewis D D,Knowles K A.Threading electronic mail:A preliminary study[J].Information Processing and Management, 1997 ; 33 (2) :209- 217.
  • 6陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126
  • 7S T Dumais,J Platt,D Heckerman et al.Inductive learning algorithms and representations for text categorization[C].In:Proc ACM-Conf Information and Knowledge Management (CIKM98) ,1998-11:148-155.
  • 8Salton G.Automatic information organization and retrieval[M].Addison-Wesley,Reading PA, 1968.
  • 9Salton G,Wong A,Yang C S.A vector space model for automatic indexing[J].Comm ACM,1975 ; 18 ( 11 ) :613~620.
  • 101Salton G,Buckley C.Term weighting approaches in automatic text retrieval[J].In Information Processing & Management, 1988 ; 24 (5) : 513~523.

二级参考文献1

共引文献125

同被引文献114

引证文献15

二级引证文献42

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部