期刊文献+

基于多代表点的文本分类研究 被引量:1

Research on Text Classification Based on Multiple Representative Points
下载PDF
导出
摘要 文本自动分类是一种有效的组织信息和管理信息的工具,传统分类方法一般在分类效果和运行效率上两者不可兼得,通过综合Rocchio和KNN2种分类方法的优点,设计出一种基于多代表点的文本分类方法,该方法通过对各类挖掘出多个有效的代表点(真实或虚拟的),再使用基于这些代表点的Rocchio和KNN方法进行分类.实验表明,该方法以较少的训练时间达到令人满意的分类效果,并且能很好地解决不平衡类问题,实验结果显示,该方法能达到与SVM相当的分类效果. Text classification is an effective tool of organization and management for information.Traditional classification methods are not good both in the effectiveness and in efficiency.This paper designed a method of classification based on multiple representative points,firstly mining a number of effective representative points to every category,and it can be true document or virtual point, then the methods of Rocchio and KNN can be working based on those points.Experiment results show that this classification method can achieve satisfactory results in less training time,and it can solve imbalance problem well,the results show that the method can achieve significant results similar to SVM.
作者 陈可华
出处 《郑州大学学报(工学版)》 CAS 北大核心 2010年第6期116-118,125,共4页 Journal of Zhengzhou University(Engineering Science)
关键词 文本分类 多代表点 ROCCHIO KNN text classification multiple representative points rocchio KNN
  • 相关文献

参考文献9

  • 1SEBASTIANI F. Machine learning in automated text categorization [ J ]. ACM Computing Surveys, 2002,34 (1) :1 -47.
  • 2苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:387
  • 3范明,范宏建.数据挖掘导论[M].北京:人民邮电出版社,2006.
  • 4YANG Y, ZHANG J, KISIEL B. A scalability analy- sis of classifiers in text categorization [ C ] //Proc. of the 26th ACM Int'l Conf. on Research and Develop- ment in Information Retrieval (SIGIR -03). Toronto: ACM Press. 2003:96- 103.
  • 5WANG P DOMENICONI C. Building Semantic Ker- nels for text classification using Wikipedia [ C ] //In Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM Press, NY,2008:713-721.
  • 6GUAN H, ZHOU J, Guo M. A class- feature- cen- troid classifier for text categorization [ C ] // Proceed- ings of the 18th international conference on World wide web. 2009:201 - 210.
  • 7TANG L, RAJAN S, NARAYANAN V K. Large scale multi - label classification via MetaLabeler[ C ] ]/Pro- ceedings of the 18th international conference on World Wide Web, 2009:211 - 220.
  • 8ZHANG C, XUE GR, YU Y, et al. Web - scale clas- sification with naive bayes [ C ]// In Proceedings of the 18th international conference on World Wide Web, 2009 : 1083 - 1084.
  • 9CHANG C C and LIN C J, LIBSVM :a library for sup- port vector machines [ EB/OL ]. Software available 2001. http ://www. csie. ntu. edu. tw/-cjlin/libs.

二级参考文献3

共引文献416

同被引文献7

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部