期刊文献+

基于聚类和信息熵的特征选择算法 被引量:4

A Feature Selection Algorithm Based on Clustering and Information Entropy
下载PDF
导出
摘要 针对分类属性数据,基于信息熵,提出一种度量特征重要程度的定义,结合聚类分析,提出一种无指导的特征选择方法.该方法时间复杂度与数据集的大小和特征个数近似成线性关系,适合于大规模数据集中的特征选择.实验结果表明,该方法具有较好的性能,提出的特征选择方法有效实用. For categorical data,a method is put forward to measure significance of feature based on information entropy.Based on clustering,an unsupervised feature selection method is presented.The time complexity of the method is nearly linear with the size of dataset and the number of features.Besides,the method is applicable to the selection of features in large dataset.The results of the experiment on UCI datasets show that the method is effective and practicable.
出处 《郑州大学学报(理学版)》 CAS 北大核心 2009年第1期77-80,共4页 Journal of Zhengzhou University:Natural Science Edition
基金 国家自然科学基金资助项目,编号60673191 广东省高等学校自然科学研究重点项目,编号06Z012 广东外语外贸大学科研创新团队项目,编号GW2006-TA-005
关键词 聚类 信息熵 特征选择 大规模数据集 clustering information entropy feature selection large dataset
  • 相关文献

参考文献2

二级参考文献7

  • 1C. C. Aggrawal, P. S. Yu. Finding generalized projected clustersin high dimensional spaces. The SIGMOD'00, Dallas, 2000.
  • 2M. Dash, H. Liu. Feature selection for clustering. The PAKDD-00, Kyoto, 2000.
  • 3F. Sebastiani. Machine learning in automated text categorization.ACM Computin Surveys, 2002, 34(1): 1--47.
  • 4Y. Yang, J. O. Pedersen. A comparative study on featureselection in text categorization. The ICML97, Nashville, 1997.
  • 5M. Rogati, Y. Yang. High performance feature selection for text categorization. The CIKM-02, Mclean, 2002.
  • 6L. Tao, L. Shengping, C. Zheng, et al.An evaluation on feature selection for text clustering. The ICML03, Washington,2003.
  • 7陆玉昌,鲁明羽,李凡,周立柱.向量空间法中单词权重函数的分析和构造[J].计算机研究与发展,2002,39(10):1205-1210. 被引量:126

共引文献36

同被引文献43

  • 1蔡景,左洪福.基于信息熵的飞机相似机型确定方法[J].飞机设计,2006,26(2):12-15. 被引量:2
  • 2苟博,黄贤武.支持向量机多类分类方法[J].数据采集与处理,2006,21(3):334-339. 被引量:63
  • 3王海燕.信息论基础[M].南京:东南大学出版社,2003
  • 4庄军,林奇英.泊松分布在生物学中的应用[J].激光生物学报,2007,16(5):655-658. 被引量:2
  • 5Shannon C E. A mathematical theory of communication[ J]. Bell Sys Tech J, 1948, 27 (3) : 379 -433,623 -659.
  • 6Zhao Jinying, Boerwinkle E, Xiong Momiao. An entropy-based statistic for genomewide association studies [ J ]. The American Journal of Human Genetics,2005,77 (1) :27 -40.
  • 7Nozaki S A, Ross S M. Approximations in multi-seller poisson queues[ J ]. Journal of Complied Probability, 1978,15 (9) :82 - 86.
  • 8Azaron A, Katagiri H, Kato K, et al. Longest path analysis in networks of queues : dynamic scheduling problems [ J ]. European Journal of Operational Research,2006,174 ( 1 ) : 132 - 149.
  • 9Wikipedia. Poisson distribution[ EB/OL]. [ 2014 - 02 - 28 ]. http ://en. wikipedia, org/wikLCPoisson_distribution.
  • 10Evans R J, Boersma J, Blachman N M, et al. The entropy of a Poisson distribution: problem 87-6[ J]. SIAM Review, 1988,30 (2) : 314 -317.

引证文献4

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部