期刊文献+

短文本数据的自动分类

Short Text Categorization
下载PDF
导出
摘要 本文以比较购物搜索中的商品数据自动分类为应用背景,探讨短文本数据的分类问题,比较了常用的文本分类(Text Categorization)算法的特点,在此基础上提出k-NN与NB相结合的多分类器方案,对于NB算法分类不可信的情况下改用k-NN算法进行再次分类,并充分利用NB的中间结果供k-NN剪枝时作参考。实验数据表明该方法在与NB相近的时间复杂度下可明显地提高短文本分类的正确率和召回率,达到实际应用的要求。 On the basis of the application of automatism in comparison shopping,this paper probes into the issue of text catego- rization.It has compared two popular algorithms for text categorization:Naive Bayes(NB)and k-Nearest Neighbor(k-NN). On this basis,it proposes another suggestion combiningthese two algorithms.In the situation that NB is unauthentic,K-NN arithmetic is suggested to be used to recategorize the results.And the k-NN algorithm can also make the best use of the results from NB algorithm during the process of recategorization.The statistics from the experiments show that under similar time com- plexity,the new algorithm can markedly improve the precision of the text categorization and the recall rate.It can reach the ex- pected demand.
出处 《微型电脑应用》 2007年第2期19-21,4-5,共3页 Microcomputer Applications
关键词 文本分类 短文本 朴素贝页斯K 近邻 Text categorization Short text Naive Bayes(NB) k-Nearest Neighbor(k-NN)
  • 相关文献

参考文献7

  • 1Kjersti Aas,Line Eikvil.Text Categorisation:A Survey[C],Technical Report,Norwegian Computing Center,1999.
  • 2Yang Y,Liu X.A Re-examination of Text Categorization Methods[C],In:Proc.of the 22nd Annual Int'l ACM SIGIR Conf.on Research and Development in Information Retrieval,New York:ACM Press,1999.
  • 3Ciya Liao,Shamim Alpha,Paul Dixon.Feature Preparation in Text Categoryization[A].
  • 4Evgeniy Gabrilovich,Shaul Markovitch.Feature Generation for Text Categorization Using World Knowledge[J].IJCAI 2005:1048-1053.
  • 5Y.Yang,J.O.Pederson,A comparative study on feature selection in text categorization[C].Proc.of the 14th International Conference on Machine Learning,ICML97,1997.
  • 6王强,王晓龙,关毅,徐志明.K-NN与SVM相融合的文本分类技术研究[J].高技术通讯,2005,15(5):19-24. 被引量:10
  • 7刘斌,黄铁军,程军,高文.一种新的基于统计的自动文本分类方法[J].中文信息学报,2002,16(6):18-24. 被引量:48

二级参考文献12

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2卜东波.聚类/分类理论研究及其在文本挖掘中的应用.中科院计算所博士学位论文[M].-,2000..
  • 3Yang Y M, Liu X. A re-examination of text categorization methods. In: Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Berkeley, USA. August, 1999. 42-49
  • 4John C P. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, MIT Press,1999. 61-73
  • 5Lin H T,Lin C J, Weng R C. A note on Platt's probabilistic outputs for support vector machines:[Technical report]. Department of Computer Science and Information Engineering, National Taiwan University, 2003
  • 6Tom A, Yang Y M. kNN at TREC-9. In: Voorhees EM and Harman DK, Eds., Proceedings of the Ninth Text Retrieval Conference (TREC-9). Department of Commerce, National Institute of Standards and Technology, 1999. 127-134
  • 7Giacinto G, Roli F, Fumera G. Selection of classifiers based on multiple classifier behaviour, workshops on syntactical and structural pattern recognition and statistical pattern recognition.Lecture Notes in Computer Science 1876. Berlin: Springer-verlag, 2000.87-93
  • 8Giacinto G, Roli F. Adaptive selection of image classifiers. In: 9th International Conference on Image Analysis and Processing ( ICIAP '97) ,Florence, Italy. Lecture Notes in Computer Science 1310. Berlin: Springer-Verlag, 1997.38-45
  • 9Paul N B, Susan T D, Eric H. Probabilistic combination of text classifiers using reliability indicators: models and results. In: SIGIR'02, 2002.207-214
  • 10黄萱菁,吴立德.基于向量空间模型的文档分类系统[J].模式识别与人工智能,1998,11(2):147-153. 被引量:24

共引文献55

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部