期刊文献+

文本分类C#实现

A Text Categorization System with C#
下载PDF
导出
摘要 设计并实现一个基于向量空间模型和简单贝叶斯的文本分类系统,系统采用层级多标签的分类策略。详细介绍词语切分统计、终分类器值计算、层级小类校正和兼类判断四个子系统模块。基于向量空间模型分类的第一级大类和层级小类的微平均分别为89.7%和77.8%,简单贝叶斯分别为67.6%和66.5%。 Based on Vector Space Model(VSM) and Naive - Bayes( NB), completed a multilayer and multi - classification text categorization system. Introduce detailedly four modules: words' segmentation and frequency statistics, calculating between classifications' and document, emendating the veracity of parent - class by emendation of subclass, judging whether document has multi - classification and multi - label. Text representation based on Vector Space Model has 89.7% MicroFl of parent - category, 77.8% of sub - category ; text representation based on Naive - Bayes has 67.6% MicroFl of parent - category, 66.5% of sub - category.
作者 刘华
出处 《现代图书情报技术》 CSSCI 北大核心 2007年第3期43-45,共3页 New Technology of Library and Information Service
基金 教育部"国家语言资源监测"项目(项目编号:L200401-01-04)的研究成果之一
关键词 文本分类 向量空间模型 简单贝叶斯 Text categorization Vector space model Naive - Bayes
  • 相关文献

参考文献5

二级参考文献20

  • 1曾雪强,王明文,陈素芬.一种基于潜在语义结构的文本分类模型[J].华南理工大学学报(自然科学版),2004,32(z1):99-102. 被引量:27
  • 2JOACHIMS T.A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization[C]//Proceedings of ICML-97,14th International Conference on Machine Learning.Nashville,TN,1997:143-151.
  • 3COHEN W W,HIRSH H.Joins that generalize:text classification using WHIRL[C]//Proc of the Fourth Int'l Conference on Knowledge Discovery and Data Mining,1998.
  • 4MCCALLUM A,NIGAM K.A comparison of event models for naive hayes text classification[C]//Learning for Text Categorization:Papers From the 1998 Workshop.AAAI Press.1998:41-48.
  • 5LI Y H,JAIN A K.Classification of text documents[J].The Computer Journal,1998,41 (8):537-548.
  • 6CORTES C,VAPNIK V.Support-Vector networks[J].Machine Learning,1995(11):273-297.
  • 7VAPNIK V V.The nature of statistical learning theory[M].New York:Springer,1995.
  • 8孙丽华.规则分类在文本自动分类中的应用[C]//20th International Conference on Computer Processing of Oriental Languages.Shen yang,China,2003.
  • 9GABRIEL P,CHEONG F.Discriminative category matching[C]//Efficient Text Classification for Huge Document Collections.ICDM,2002:187-194.
  • 10YANG Y,PEDERSEN J P.Feature selection in statistical learning of text categorization[C]//rhe 14th Int Conf on Machine Learning,1997:412-420.

共引文献14

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部