期刊文献+

基于改进的kNN算法的中文网页自动分类方法研究 被引量:20

Research of Chinese Web classification method based on improved kNN algorithm
下载PDF
导出
摘要 概述了中文网页分类的一般过程,重点论述了在分类过程中特征词提取、训练库建立和文本分类算法等关键问题,针对向量空间模型的文本特征表示方法中特征词数量的多少与分类算法的效率有着密切关系的特点,提出了基于词性的特征词提取方法,并且在文本相似度计算时,融入传统的特征向量的比较方法来对kNN算法进行改进,提出了基于特征词减少的改进kNN算法,提高了分类算法的效率和性能. The procedure of Chinese Web classification is described; and the keys of this classification including feature selection, building the training collection and text categorization algorithm are discussed crucially. The quantity of characteristic word in the text characteristic expression method of vector space model has an intimate relationship with the efficiency of classification algorithm. A characteristic word extraction method has been deeloped based on word gender. By fusing the traditional method which comparing the feature vectors when computing the similarity of texts to reform the k-nearest neighbor (kNN) algorithm, a modified kNN algorithm, which is based on lessening of characteristic words and data division respectively, has been proposed; so that the efficiency and performance of classification algorithm are improved.
出处 《武汉大学学报(工学版)》 CAS CSCD 北大核心 2007年第4期141-144,共4页 Engineering Journal of Wuhan University
关键词 特征词 训练库 文本相似度 KNN算法 characteristic words training collection similarity of the text kNN algorithm.
  • 相关文献

参考文献11

  • 1Nievergelt J,Hinterberger H,Sevcik K.The gridfile:an adaptable symmetric multikey file stucture[C]//ACM Trans.on Database Systems,1984,9(1):38-71.
  • 2Bentley J L.Multidimensional binary search trees in database applications[J].Software Engineering,1979,5(4):333-340.
  • 3Beckmann N,Kriegel H,Schneider R,et al.R*-tree:an efficient and robust access method for points and rectangles[C]//ACM SIGMOD,1990:322-231.
  • 4Berchtold S,Keim D,Kriegel H P.The X-tree:an index structures for high-dimensional data[C]//22th VLDB,1996:28-39.
  • 5White D A,Jzin R.Similarity indexing with the SS-tree[C]//Proceedings of the Twelfth International Conference on Data Engineering,1996:516-523.
  • 6Jin H,Ooi B B,Shen H T,Ao Ying Zhou.An adaptive and efficient dimensionality reduction algorithm for high-dimensional indexing[C]//Proceedings of the 19th International Conference on Data Engineering,2003:87-98.
  • 7Flickner M,Sawhney H,Niblack W,et al.Query by image and video content:the QBIC system[J].Computer,1995,28(9):23-32.
  • 8Wu P,Manjunath B S,Chandrasekaran S.An adaptive index structure for high-dimensional similarity search[C]//PCM 2001,LNCS 2195,2001:71-78.
  • 9Cha G-H,Zhu X,Petkovic D,Chung C-W.An efficient indexing method for nearest neighbor searches in high-dimensional image databases[J].IEEE Transactions on Multimedia,2002,4(1):76-87.
  • 10Hanan Samet.Depth-first k-nearest neighbor finding using the maxnearestdist estimator[C]//Proceedings of the 12th International Conference on Image Analysis and Proceeding,2003:486-491.

二级参考文献4

共引文献25

同被引文献166

引证文献20

二级引证文献100

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部