期刊文献+

改进的最大熵权值算法在文本分类中的应用 被引量:8

Research of Text Categorization Based on Improved Maximum Entropy Algorithm
下载PDF
导出
摘要 由于传统算法存在着特征词不明确、分类结果有重叠、工作效率低的缺陷,为了解决上述问题,提出了一种改进的最大熵文本分类方法。最大熵模型可以综合观察到的各种相关或不相关的概率知识,对许多问题的处理都可以达到较好的结果。提出的方法充分结合了均值聚类和最大熵值算法的优点,算法首先以香农熵作为最大熵模型中的目标函数,简化分类器的表达形式,然后采用均值聚类算法对最优特征进行分类。经过实验论证,所提出的新算法能够在较短的时间内获得分类后得到的特征集,大大缩短了工作的时间,同时提高了工作的效率。 This paper discussed the problems in text categorization accuracy.In traditional text classification algorithm,different feature words have the same affecte on classification result,and classification accuracy is lower,causing the increase algorithm time complexity.Because the maximum entropy model can integrated various relevant or irrelevant probability knowledge observed,the processing of many issues can achieve better results.In order to solve the above problems,this paper proposed an improved maximum entropy text classification,which fully combines c-mean and maximum entropy algorithm advantages.The algorithm firstly takes shannon entropy as maximum entropy model of the objective function,simplifies classifier expression form,and then uses c-mean algorithm to classify the optimal feature.The simulation results show that the proposed method can quickly get the optimal classification feature subsets,greatly improve text classification accuracy,compared with the traditional text classification.
作者 李学相
出处 《计算机科学》 CSCD 北大核心 2012年第6期210-212,共3页 Computer Science
基金 国家高技术研究发展计划(2007AA010408)资助
关键词 文本分类 最大熵算法 均值聚类 特征选择 Text classification Maximum entropy algorithm C-mean Feature selection
  • 相关文献

参考文献11

二级参考文献69

  • 1吴军,王作英,禹锋,王侠.汉语语料的自动分类[J].中文信息学报,1995,9(4):25-32. 被引量:24
  • 2程克非,张聪.基于特征加权的朴素贝叶斯分类器[J].计算机仿真,2006,23(10):92-94. 被引量:40
  • 3陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 4卜东波.聚类/分类理论研究及其在文本挖掘中的应用.中科院计算所博士学位论文[M].-,2000..
  • 5杨延彬.免疫学及检验[M].北京:人民卫生出版社,1999.1-65.
  • 6NIR FRIEDMAN, DAN GEIGER, MOISES GOLDSZMIDT. Bayesian Network Classifiers[ J]. Machine Learning, 1997, 29:131 - 163.
  • 7Pat L, Wayne I, Kevin T. An Analysis of Bayesian Classifiers. In Proceeding of the Tenth National Conference on Artificial Intelligence[ M ]. San Jose : AAAI Press, 1992:223 - 228.
  • 8Zijian Zheng,Geoffrey I W ,Kai Ming Ting. Lazy Bayesian Rules: A Lazy Semi-Naive Bayesian Learning Technique Competitive to Boosting Decision Trees[ C ]//the Proceeding of the Sixteenth International Conference on Machine Learning(ICML-99). [S.l.]:[s.n.], 1999:493-502.
  • 9Ying Yang, Geoffrey I W. A Comparative Study of Discretization Methods for Naive-Bayes Classifiers [ C ]//The 2002 Pacific Rim Knowledge Acquisition Workshop. Tokyo : [ s. n. ] ,2002 : 159 - 173.
  • 10Cerquides J , Ramom Lopez de Mantaras. The Indifferent Naive Bayes Classifier [ C ]//American Association for Artificial Intelligence. [ S. l. ] : [ s. n. ], 2003 : 341 - 345.

共引文献660

同被引文献62

  • 1单丽莉,刘秉权,孙承杰.文本分类中特征选择方法的比较与改进[J].哈尔滨工业大学学报,2011,43(S1):319-324. 被引量:25
  • 2高寅生.安全漏洞库设计与实现[J].微电子学与计算机,2007,24(3):99-101. 被引量:9
  • 3杨淑莹.模式识别与智能计算[M].北京:电子工业出版社,2011.
  • 4http://deeplearning.stanford.edu/wiki/index.php/Autoencoders_and_Sparsity.
  • 5Hinton G E.Learning multiple layers of representation.Trends in Cognitive Sciences,2007 ; (11):428-434.
  • 6VAPNIK V. The nature of statistical learning theory [ M ]. New York : Springer-Verlag, 1995 : 112- 268.
  • 7De LUCA A, TERMINI S. A definition of a nonprobabilities entropy in the setting of fuzzy set theory[ J ]. Inform and Control, 1972,20 (4) : 301-312.
  • 8Zhi XB,Fan JL,Zhao F.Fuzzy linear discriminant analysisguided maximum entropy fuzzy clustering algorithm[J].Pattern Recognition,2013,46(6):1604-1615.
  • 9Mirkin BG.Clustering:A data recovery approach[M].CRC Press,2012.
  • 10Malliaros FD,Vazirgiannis M.Clustering and community detection in directed networks:A survey[J].Physics Reports,2013,533(4):95-142.

引证文献8

二级引证文献52

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部