摘要
随着WWW的迅猛发展 ,文本分类成为处理和组织大量文档数据的关键技术 kNN方法作为一种简单、有效、非参数的分类方法 ,在文本分类中得到广泛的应用 但是这种方法计算量大 ,而且训练样本的分布不均匀会造成分类准确率的下降 针对kNN方法存在的这两个问题 ,提出了一种基于密度的kNN分类器训练样本裁剪方法 ,这种方法不仅降低了kNN方法的计算量 ,而且使训练样本的分布密度趋于均匀 ,减少了边界点处测试样本的误判 实验结果显示 。
With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data As a simple, effective and nonparametric classification method, k NN method is widely used in document classification But k NN classifier not only has large computational demands, but also may decrease the precision of classification because of the uneven density of training data In this paper, a density based method for reducing the amount of training data is presented, which solves two problems mentioned above It not only reduces the computational demands of k NN classifier, but also makes the density of training data even and decreases the wrong classification between the edge of classes The experiment also shows that it has good performance
出处
《计算机研究与发展》
EI
CSCD
北大核心
2004年第4期539-545,共7页
Journal of Computer Research and Development
基金
国家自然科学基金项目 (60 173 0 2 7)
关键词
文本分类
KNN
快速分类
text classification
k nearest neighbor
fast classification