期刊文献+

基于代表样本动态生成的快速文本分类

Fast Text Classification Based on Dynamical Generation of Representative Samples
下载PDF
导出
摘要 κ-近邻作为一种简单、有效、非参数的分类方法,在文本分类中得到广泛的应用,但是这种方法计算量较大。针对κ-近邻法的不足之处,提出了一种新的快速文本分类方法,通过对原始训练样本集的训练生成代表样本,再根据原始训练样本与已生成代表样本之间的分布状况,对已生成的代表样本进行多次调整,从而使代表样本更具有代表性。这种方法有效地压缩了原始训练样本集,提高了分类效率;同时,由于代表样本的分布更加合理,可以提高分类的准确性。实验结果显示,此方法具有很好的分类性能。 As a simple, effective and nonparametric classification method, k- Nearest Neighbor method is widely used in text classification, but it has large computational demands. In this paper a new fast text classification approach is proposed to solve the problem. The method generates representative samples through training the original samples, and then adjusts the representative samples repeatedly for enhancing its representative ability according to the distribution of the original training samples and generated representative samples. By using this approach, the original training corpus can be compressed effectively so that the classification efficiency can be improved substantially. Meanwhile, this approach makes the distribution of representative samples more even, so the classification performance can be improved. Experiments also show that this approach has a good performance.
作者 华北 曹先彬
出处 《计算机仿真》 CSCD 2007年第6期322-325,共4页 Computer Simulation
基金 国家自然科学基金(60204009) 中科院复杂系统与智能科学重点实验室开放基金(20040104) 973课题(2004CB318109)。
关键词 文本分类 代表样本 快速分类 Text classification Representative samples Fast classification
  • 相关文献

参考文献9

  • 1Y YANG,X LIN.A re-examination of text categorization methods[C].The 22th Annual Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'99),New York:ACM Press,1999.42-49.
  • 2S O Belkasim,M Shridhar,M Ahmadi.Pattern classification using an efficient KNNR[J].Pattern Recognition,1992,25(10):1269-1274.
  • 3乔玉龙,潘正祥,孙圣和.一种改进的快速k-近邻分类算法[J].电子学报,2005,33(6):1146-1149. 被引量:25
  • 4李杨,曾海泉,刘庆华,胡运发.基于kNN的快速WEB文档分类[J].小型微型计算机系统,2004,25(4):725-729. 被引量:13
  • 5P E Hart.The condensed nearest neighbor rule[J].IEEE Trans on Information Theory,1968,IT-14(3):515-516.
  • 6D L Wilson.Asymptotic properties of nearest neighbor rules using edited data[J].IEEE Trans on systems,Man and Cybernetics,1972,2(3):408-421.
  • 7P Devijver,J Kittler.Pattern Recognition:A Statistical Approach[M].Englewood Cliffs:Prentice Hall,1982.
  • 8李荣陆,胡运发.基于密度的kNN文本分类器训练样本裁剪方法[J].计算机研究与发展,2004,41(4):539-545. 被引量:98
  • 9Shuigeng Zhou,Tok Wang Ling,Jihong Guan,Jiangtao Hu,Aoying Zhou.Fast text classification:a training-corpus pruning based approach[C].Proceedings of the 8th International Conference on Database Systems for Advanced Application(DASFAA 2003),IEEE CS,March 26-28,Kyoto,Japan,pp.127-136.

二级参考文献39

  • 1[1]Yang Y and Liu X. A re-examination of text categorization methods[C]. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 1999, 42~49.
  • 2[2]Dasarathy B V. Neatest neighbor(NN) norms: NN pattern classification techniques[C]. Los Alamitos, CA:IEEE Computer Society Press, 1991.
  • 3[3]Harrt P E. The condensed nearest neighbor rule[J]. IEEE Trans. Information Theory ,May 1968,IT-14(3):515~516.
  • 4[4]Dasarathy Y, Minimal B V. Consistent set (MCS) identification for optimal nearest neighbor decision system terms design[J]. IEEE Trans. Syst. Man Cybern. ,March 1994,24(3):511~517.
  • 5[5]Kuncheva L I. Fitness functions in editing K-NN reference set by genetic algorithms[J]. Pattern Rcognition,1997,30(6):1041~1049.
  • 6[6]Zhong Hong-bin, Sun Guang-yu. Optimal selection of & Technology, May 2001,16(2): 126~136.reference set for the nearest neighbor classification by Tabu search[J]. Journal of Computer Science
  • 7[7]Masand B, Linoff G and Waltz D. Classifying news stories using memory-based reasoning[C]. In: International ACM SIGIR Conference on Research and Development in Information Retrieval, 1992, 59~65.
  • 8[8]Yang Y. Expert network: effective and efficient learning from human decisions in text categorization and retrieval[C]. In:Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'94) 1994:11~21.
  • 9[9]Iwayama M and Tokunaga T. Cluster-based text categorization: a comparison of category search strategies[C]. In: Proceedings of the 18th Ann Int ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR'95), 1995, 273~281.
  • 10[10]Yang Y and Pederson J. Feature selection in statistical learning of text categorization[C]. In: ICML-97, 1997,412~420.

共引文献124

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部