期刊文献+

一种基于增量学习型矢量量化的有效文本分类算法 被引量:14

Improved Growing Learning Vector Quantification for Text Classification
下载PDF
导出
摘要 KNN作为一种简单的分类方法在文本分类中有广泛的应用,但存在着计算量大和训练文档分布不均所造成的分类准确率下降等问题.针对这些问题,基于最小化学习误差的增量思想,该文将学习型矢量量化(LVQ)和生长型神经气(GNG)结合起来提出一种新的增量学习型矢量量化方法,并将其应用到文本分类中.文中提出的算法对所有的训练样本有选择性地进行一次训练就可以生成有效的代表样本集,具有较强的学习能力.实验结果表明:这种方法不仅可以降低KNN方法的测试时间,而且可以保持甚至提高分类的准确性. As a simple classification method KNN has been widely applied in text classification. There are two problems in KNN-based text classification: the large computation load and the deterioration of classification accuracy caused by the non-uniform distribution of training samples. To solve these problems, based on minimizing the increment of learning errors and combining LVQ and GNG, the authors propose a new growing LVQ method and apply it to text classification. The method can generate an effective representative sample set after one phase of selective training of the training sample set, and hence has a strong learning ability. Experimental results show that this method can not only reduce the testing time of KNN, but also maintain or even improve the accuracy of classification.
作者 王修君 沈鸿
出处 《计算机学报》 EI CSCD 北大核心 2007年第8期1277-1285,共9页 Chinese Journal of Computers
关键词 学习型矢量量化(LVQ) 生长型神经气(GNG) 学习误差 类间距离 学习概率 learning vector quantification growing neural gas learning error inter-class distance learning probability
  • 相关文献

参考文献20

  • 1Fukunaga K,Narendra P M.A branch and bound algorithm for computing k-nearest neighbors.IEEE Transactions on Computers,1975,24(7):750-753
  • 2乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量:25
  • 3Pan J S,Qiao Y L,Sun S H.A fast k-nearest neighbors classification algorithm.IEICE Transactions on Fundamentals of Electronics,Communications and Computer Sciences,2004,E87-A(4):961-963
  • 4Huang W J,Wen K W.Fast KNN classification algorithm based on partial distance search.Electron Letters,1998,34(21):2062-2063
  • 5Hart P E.The condensed nearest neighbor rule.IEEE Transactions on Information Theory,1968,14(3):515-516
  • 6李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545. 被引量:98
  • 7Li Rong-Lu,Hu Yun-Fa.Noise reduction to text categorization based on density for KNN//Proceedings of the 2nd International Conference on Machine Learning and Cybernetics.Xi'an,2003:3119-3124
  • 8Zhou Shui-Geng et al.Fast text classification:A trainingcorpus pruning based approach//Proceedings of the 8th International Conference on Database Systems for Advanced Application.Los Alamitos:IEEE Computer Society,2003:127-136
  • 9Wilson D L.Asymptotic properties of nearest neighbor rules using edited data.IEEE Transactions on Systems,Man and Cybernetics,1972,2(3):408-421
  • 10Devijver P,Kittler J.Pattern Recognition:A Statistical Approach.Englwood Cliffs:Prentice Hall,1982

二级参考文献27

  • 1[1]D D Lewis. Naive (Bayes) at forty: The independence assumption in information retrieval. In: The 10th European Conf on Machine Learning(ECML98), New York: Springer-Verlag, 1998. 4~15
  • 2[2]Y Yang, X Lin. A re-examination of text categorization methods. In: The 22nd Annual Int'l ACM SIGIR Conf on Research and Development in Information Retrieval, New York: ACM Press, 1999
  • 3[3]Y Yang, C G Chute. An example-based mapping method for text categorization and retrieval. ACM Trans on Information Systems, 1994, 12(3): 252~277
  • 4[4]E Wiener. A neural network approach to topic spotting. The 4th Annual Symp on Document Analysis and Information Retrieval (SDAIR 95), Las Vegas, NV, 1995
  • 5[5]R E Schapire, Y Singer. Improved boosting algorithms using confidence-rated predications. In: Proc of the 11th Annual Conf on Computational Learning Theory. Madison: ACM Press, 1998. 80~91
  • 6[6]T Joachims. Text categorization with support vector machines: Learning with many relevant features. In: The 10th European Conf on Machine Learning (ECML-98). Berlin: Springer, 1998. 137~142
  • 7[7]S O Belkasim, M Shridhar, M Ahmadi. Pattern classification using an efficient KNNR. Pattern Recognition Letter, 1992, 25(10): 1269~1273
  • 8[8]V E Ruiz. An algorithm for finding nearest neighbors in (approximately) constant average time. Pattern Recognition Letter, 1986, 4(3): 145~147
  • 9[9]P E Hart. The condensed nearest neighbor rule. IEEE Trans on Information Theory, 1968, IT-14(3): 515~516
  • 10[10]D L Wilson. Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans on Systems, Man and Cybernetics, 1972, 2(3): 408~421

共引文献119

同被引文献160

引证文献14

二级引证文献59

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部