期刊文献+

基于信息论的文本分类模型 被引量:1

Text classification model based on information theory
下载PDF
导出
摘要 从信息论的角度,提出了一种新的文本分类模型。该模型以文本提供的关于类别的信息作为分类依据,从另一个角度来思考文本分类问题。从实用性的角度来看,该模型与传统的朴素贝叶斯模型和基于KL距离的中心向量法具有一定的关系,并给出了证明。根据广义信息论的基本概念,又对此模型进行推广,提出了特征权重的概念,可以通过修正特征权重来修正文本分类模型,为成功解决文本分类模型的修正问题提供了理论基础。 A new text classification model from the perspective of information theory is proposed. Considering text classification problem from another angle, this model employed the category information obtained from the text as the basis for classification. From the view of practicability, we proved it that this model has some relationships with the traditional naive Bayesian model and KL-distance based central vector method. According to the basic concept of generalized information theory, the promotion is carried on to this model and introduced the concept of feature weight, which has provided a foundational theory for solving the text classification model revision question successfully.
出处 《计算机工程与设计》 CSCD 北大核心 2008年第24期6312-6315,共4页 Computer Engineering and Design
基金 国家973重点基础研究发展计划基金项目(2004CB318109、2007CB311100)
关键词 文本分类 信息论 广义信息论 互信息 信息熵 特征权重 text classification information theory general information theory mutual information information entropy feature weight
  • 相关文献

参考文献8

  • 1Larkey L S,Croft W B.Combining classifiers in text categorization[C].Switzerland:Proceedings of SIGIR-96,19th ACM International Conference on Research and Development in Information Retrieval, 1996:289-297.
  • 2Schapire R E,Singer Y.BoosTexter: A boosting-based system for text categorization[J].Machine Learning,2000,39(2/3): 135-168.
  • 3Tan Songbo,Chen Xueqi,Moustafa M Ghanem,et al.A novel refinement approach for text categorization[C].Proc of the 14th ACM International Conference on Information and Knowledge Management,2005:469-476.
  • 4Naftali Tishby, Femando C Pereira,William Bialek.The information bottleneck method[J].In Proc of the 37th Allerton Conference on Communication and Computation, 1999.
  • 5Kjersti Aas,Line Eikvil.Text categorisation[R].A survey, Norwegian Computing Center, 1999.
  • 6Sebastiani F.A tutorial on automated text categorisation[J].Proceedings of ASAI-99,1 st Argentinian Symposium on Artificial Intelligence, 1999:7-35.
  • 7Schapire R E,Singer Y, Singhal A.Boosting and rocchio applied to text filtering[C].Proceedings of SIGIR-98,21 st ACM International Conference on Research and Development in Information Retrieval, 1998:215-223.
  • 8Joachims T. A probabilistic, analysis of the rocchio algorithm with TFIDF for trxt categorization [C]. Int Conf Machine Learning, 1997.

同被引文献1

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部