摘要
论文讨论了两个常用的文本分类算法:向量空间法和k近邻方法。前者速度快,但是分类精度通常不能令人满意。后者则相反,它在分类时要花费更多的时间,但分类效果要好很多。通过综合它们的优点提出了一个新的文本分类算法:向量空间法和k近邻的组合方法。试验表明,新算法能在较少的时间复杂度上达到甚至超过k近邻的分类效果。
This paper discusses two popular algorithms for text categorization:Vector Space ModeI(VSM) and k Nearest Neighbor(kNN).The former is a simple and fast algorithm,but its precision is often not satisfying.On the contrary,the latter spends much time determining the class label of a query document,but often gains better categorization performance.We have proposed a new algorithm,hybrid of VSM and kNN,by combining the strength of these two algorithms.We have performed an experimental evaluation of the effectiveness of this algorithm.The result of experiment demonstrates that the new algorithm achieves a competitive(or even better) performance to the well-known algorithm kNN at the cost of much less computation.
出处
《计算机工程与应用》
CSCD
北大核心
2005年第29期180-183,共4页
Computer Engineering and Applications