期刊文献+

Feature selection algorithm for text classification based on improved mutual information 被引量:1

Feature selection algorithm for text classification based on improved mutual information
下载PDF
导出
摘要 In order to solve the poor performance in text classification when using traditional formula of mutual information (MI) , a feature selection algorithm were proposed based on improved mutual information. The improved mutual information algorithm, which is on the basis of traditional improved mutual information methods that enbance the MI value of negative characteristics and feature' s frequency, supports the concept of concentration degree and dispersion degree. In accordance with the concept of concentration degree and dispersion degree, formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these. In this paper, the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods. The experimental results showed that the improved mutu- al information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain. Through the introduction of the concept of concentration degree and dispersion degree, the improved mutual information feature selection method greatly improves the performance of text classification system. In order to solve the poor performance in text classification when using traditional formula of mutual information (MI),a feature selection algorithm were proposed based on improved mutual information.The improved mutual information algorithm,which is on the basis of traditional improved mutual information methods that enhance the MI value of negative characteristics and feature's frequency,supports the concept of concentration degree and dispersion degree.In accordance with the concept of concentration degree and dispersion degree,formulas which embody concentration degree and dispersion degree were constructed and the improved mutual information was implemented based on these.In this paper,the feature selection algorithm was applied based on improved mutual information to a text classifier based on Biomimetic Pattern Recognition and it was compared with several other feature selection methods.The experimental results showed that the improved mutual information feature selection method greatly enhances the performance compared with traditional mutual information feature selection methods and the performance is better than that of information gain.Through the introduction of the concept of concentration degree and dispersion degree,the improved mutual information feature selection method greatly improves the performance of text classification system.
出处 《Journal of Harbin Institute of Technology(New Series)》 EI CAS 2011年第3期144-148,共5页 哈尔滨工业大学学报(英文版)
基金 Sponsored by the National Nature Science Foundation Projects (Grant No. 60773070,60736044)
关键词 text classification feature selection improved mutual information: Biomimetie Pattern Recognition text classification feature selection improved mutual information Biomimetic Pattern Recognition
  • 相关文献

参考文献2

二级参考文献9

  • 1Fisher R.A.Contributions to Mathematical Statistics [M].New York:J.Wiley,1952.
  • 2陈季镐(美)著,邱炳章,邱华译.统计模式识别 [M].北京:北京邮电学院出版社,1989.
  • 3Vapnik V.N and Chervonenkis A.Ja.Theory of Pattern Recognition [M].Nauka,Moscow,1974.
  • 4Boser B,Guyon I and Vapnik V.N.A training algorithm for optimal margin classifirers [A].Fifth Annual Workshop on Computational Learning Teory [C].Pittsburgh:ACM,1992.144-152.
  • 5A D 亚历山大洛夫等著,王元等译.数学--它的内容、方法和意义 [M].北京:科学出版社,2001.
  • 6Ryszard Engelking.Dimension Theory [M].PWN-Polish Scientific Publishers-Warszawa,1978.
  • 7VladimirNVapnik著 张学工译.统计学习理论的本质[M].北京:清华大学出版社,2000,9..
  • 8王守觉,李兆洲,陈向东,王柏南.通用神经网络硬件中神经元基本数学模型的讨论[J].电子学报,2001,29(5):577-580. 被引量:45
  • 9王守觉,王柏南.人工神经网络的多维空间几何分析及其理论[J].电子学报,2002,30(1):1-4. 被引量:87

共引文献189

同被引文献16

  • 1陈丽珍,卡米力.毛依丁.WEB维文信息检索系统中维文的存储和特征项抽取[J].新疆大学学报(自然科学版),2006,23(1):90-92. 被引量:1
  • 2茆诗松,程依明,濮晓龙.概率论与数理统计教程[M].北京:高等教育出版社.2009.
  • 3Soumen Chakrabarti.Web数据挖掘[M].北京:人民邮电出版社,2009,53-137.
  • 4Yiming yang, Jan O Pedersen. A comparative Study on Feature Selection in text Categorization In:Proceeding of the Fourteenth International[C].Conference on Machine Learning ICML(97),1997,2-6.
  • 5Yiming Yang. A study on thresholding strategies for text categorization[C]. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SICIR'01),2001,137-145.
  • 6Hanchuan Peng,Fuhui Long,Chris Ding. Feature Selection Based on Mutual Information:Criteria of Max-Dependency, Max- Relevance,and Min-Redundacy[J]. IEEE TRANSACTION ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2005,27(2):1228-1236.
  • 7VLACHOS A. Active learning with support vector machines[D]. MS:University of Edinburgh,2004.
  • 8Hsu C W, Lin C J. A comparison of methods for multi class support vector machines[J]. IEEE Transactions on Neural Networks, 2002,13(2):415-425.
  • 9Yang Yi-ming.An evaluation of statistical approaches to text categorization [J]. Information Retrieval, 1999,1(1):76-88.
  • 10Mladenic D.Machine Learning on non - homogeneous, Distributed Text Data[D].Doctoral Dissertation , University of Ljublijana ,1998: 163-168.

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部