基于信息论的文本分类模型被引量：1

Text classification model based on information theory

下载PDF

导出

摘要从信息论的角度,提出了一种新的文本分类模型。该模型以文本提供的关于类别的信息作为分类依据,从另一个角度来思考文本分类问题。从实用性的角度来看,该模型与传统的朴素贝叶斯模型和基于KL距离的中心向量法具有一定的关系,并给出了证明。根据广义信息论的基本概念,又对此模型进行推广,提出了特征权重的概念,可以通过修正特征权重来修正文本分类模型,为成功解决文本分类模型的修正问题提供了理论基础。 A new text classification model from the perspective of information theory is proposed. Considering text classification problem from another angle, this model employed the category information obtained from the text as the basis for classification. From the view of practicability, we proved it that this model has some relationships with the traditional naive Bayesian model and KL-distance based central vector method. According to the basic concept of generalized information theory, the promotion is carried on to this model and introduced the concept of feature weight, which has provided a foundational theory for solving the text classification model revision question successfully.

作者唐亮段建国许洪波梁玲

机构地区解放军信息工程大学信息工程学院中国科学院计算技术研究所网络科学技术部

出处《计算机工程与设计》 CSCD 北大核心 2008年第24期6312-6315,共4页 Computer Engineering and Design

基金国家973重点基础研究发展计划基金项目(2004CB318109、2007CB311100)

关键词文本分类信息论广义信息论互信息信息熵特征权重 text classification information theory general information theory mutual information information entropy feature weight

分类号 TP391.1 [自动化与计算机技术—计算机应用技术]

引文网络
相关文献

参考文献8

1Larkey L S,Croft W B.Combining classifiers in text categorization[C].Switzerland:Proceedings of SIGIR-96,19th ACM International Conference on Research and Development in Information Retrieval, 1996:289-297.
2Schapire R E,Singer Y.BoosTexter: A boosting-based system for text categorization[J].Machine Learning,2000,39(2/3): 135-168.
3Tan Songbo,Chen Xueqi,Moustafa M Ghanem,et al.A novel refinement approach for text categorization[C].Proc of the 14th ACM International Conference on Information and Knowledge Management,2005:469-476.
4Naftali Tishby, Femando C Pereira,William Bialek.The information bottleneck method[J].In Proc of the 37th Allerton Conference on Communication and Computation, 1999.
5Kjersti Aas,Line Eikvil.Text categorisation[R].A survey, Norwegian Computing Center, 1999.
6Sebastiani F.A tutorial on automated text categorisation[J].Proceedings of ASAI-99,1 st Argentinian Symposium on Artificial Intelligence, 1999:7-35.
7Schapire R E,Singer Y, Singhal A.Boosting and rocchio applied to text filtering[C].Proceedings of SIGIR-98,21 st ACM International Conference on Research and Development in Information Retrieval, 1998:215-223.
8Joachims T. A probabilistic, analysis of the rocchio algorithm with TFIDF for trxt categorization [C]. Int Conf Machine Learning, 1997.

同被引文献1

1尹江,尹治本,黄洪.网络爬虫效率瓶颈的分析与解决方案[J].计算机应用,2008,28(5):1114-1116. 被引量：18

引证文献1

1王凯月,黄珊,王逸飞,孙红军,苏雪松,延伟.油田环保安全领域标准数据关联性监测技术研究[J].标准科学,2024(2):47-52.

1李德银,张宁虹.S-型软件可靠性模型的修正与性能评价[J].系统工程与电子技术,1991,13(12):55-65. 被引量：2
2初建崇,刘培玉,王卫玲.Web文档中词语权重计算方法的改进[J].计算机工程与应用,2007,43(19):192-194. 被引量：14
3王利民,徐沛娟,李雄飞.基于广义信息论的决策森林多重子模型集成方法[J].模式识别与人工智能,2009,22(2):325-329. 被引量：1
4骆昌日,张新华,何婷婷,骆世广.基于DCM的中文文本分类[J].计算机工程与应用,2006,42(34):157-159. 被引量：1
5刘严岩,吴秀清.基于不确定性的数据融合系统性能评估[J].数据采集与处理,2005,20(2):150-155. 被引量：1
6王利民,李雄飞,张海龙.基于广义信息论的贝叶斯分类器动态建模[J].吉林大学学报（工学版）,2009,39(3):776-780. 被引量：5
7王爱平,徐晓艳,国玮玮,李仿华.基于改进KNN算法的中文文本分类方法[J].微型机与应用,2011,30(18):8-10. 被引量：7
8严建强.信息论与博物馆[J].中国博物馆,1986(1):1-6. 被引量：4
9陈娜.基于分类技术的Blog用户兴趣挖掘[J].科学之友（中）,2010(2):155-156.
10刘丽珍,宋瀚涛,陆玉昌.基于二次熵的互信息特征选取方法的研究[J].计算机科学,2004,31(12):135-136. 被引量：2

计算机工程与设计

2008年第24期

浏览历史

内容加载中请稍等...

基于信息论的文本分类模型被引量：1

参考文献8

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于信息论的文本分类模型 被引量：1

参考文献8

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于信息论的文本分类模型被引量：1