期刊文献+

基于密度的kNN文本分类器训练样本裁剪方法 被引量:98

A Density-Based Method for Reducing the Amount of Training Data in kNN Text Classification
下载PDF
导出
摘要 随着WWW的迅猛发展 ,文本分类成为处理和组织大量文档数据的关键技术 kNN方法作为一种简单、有效、非参数的分类方法 ,在文本分类中得到广泛的应用 但是这种方法计算量大 ,而且训练样本的分布不均匀会造成分类准确率的下降 针对kNN方法存在的这两个问题 ,提出了一种基于密度的kNN分类器训练样本裁剪方法 ,这种方法不仅降低了kNN方法的计算量 ,而且使训练样本的分布密度趋于均匀 ,减少了边界点处测试样本的误判 实验结果显示 。 With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data As a simple, effective and nonparametric classification method, k NN method is widely used in document classification But k NN classifier not only has large computational demands, but also may decrease the precision of classification because of the uneven density of training data In this paper, a density based method for reducing the amount of training data is presented, which solves two problems mentioned above It not only reduces the computational demands of k NN classifier, but also makes the density of training data even and decreases the wrong classification between the edge of classes The experiment also shows that it has good performance
出处 《计算机研究与发展》 EI CSCD 北大核心 2004年第4期539-545,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目 (60 173 0 2 7)
关键词 文本分类 KNN 快速分类 text classification k nearest neighbor fast classification
  • 相关文献

参考文献13

  • 1[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4~15
  • 2[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
  • 3[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252~277
  • 4[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
  • 5[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80~91
  • 6[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137~142
  • 7[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269~1273
  • 8[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145~147
  • 9[9]P E Hart. The condensed nearest neighbor rule. IEEE Trans on Information Theory, 1968, IT-14(3): 515~516
  • 10[10]D L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans on Systems, Man and Cybernetics, 1972, 2(3): 408~421

同被引文献841

引证文献98

二级引证文献809

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部