期刊文献+

互信息与模糊C均值聚类集成的特征优选方法 被引量:2

Feature selection method based on integration of mutual information and fuzzy C-means clustering
下载PDF
导出
摘要 针对大型数据中大量冗余特征的存在可能降低数据分类性能的问题,提出了一种基于互信息(MI)与模糊C均值(FCM)聚类集成的特征自动优选方法 FCC-MI。首先分析了互信息特征及其相关度函数,根据相关度对特征进行排序;然后按照最大相关度对应的特征对数据进行分组,采用FCM聚类方法自动确定最优特征数目;最后基于相关度对特征进行了优选。在UCI机器学习数据库的7个数据集上进行实验,并与相关文献中提出的基于类内方差与相关度结合的特征选择方法(WCMFS)、基于近似Markov blanket和动态互信息的特征选择算法(B-AMBDMI)及基于互信息和遗传算法的两阶段特征选择方法(T-MI-GA)进行对比。理论分析和实验结果表明,FCC-MI不但提高了数据分类的效率,而且在有效保证分类精度的同时能自动确定最优特征子集,减少了数据集的特征数目,适用于海量、数据特征相关性大的特征约简及数据分析。 Plenty of redundant features may reduce the performance of data classification in massive dataset, so a new method of automatic feature selection based on the integration of Mutual Information and Fuzzy C-Means( FCM) clustering,named FCC-MI, was proposed to resolve this problem. Firstly, MI and its correlation function were analyzed, then the features were sorted according to the correlation value. Secondly, the data was grouped according to the feature with the maximum correlation, and the number of the optimal features were determined automatically by FCM clustering method. At last, the optimization selection of the features was performed using correlation value. Experiments on seven datasets of UCI machine learning database were conducted to compare FCC-MI with three methods come from the literatures, including WCMFS( Within class variance and Correlation Measure Feature Selection), B-AMBDMI( Based on Approximating Markov Blank and Dynamic Mutual Information), and T-MI-GA( Two-stage feature selection algorithm based on MI and GA). The theoretical analysis and experimental results show that the proposed method not only improves the efficiency of data classification, but also ensures the classification accuracy and automatically determine the optimal feature subset, which reduces the number of the features of the dataset, thus it is suitable for feature reduction and analysis of mass data with large correlation features.
作者 朱接文 肖军
出处 《计算机应用》 CSCD 北大核心 2014年第9期2608-2611,2649,共5页 journal of Computer Applications
关键词 互信息 特征优选 模糊C均值聚类 数据分组 Mutual Information(MI) feature selection Fuzzy C-Means(FCM) clustering data grouping
  • 相关文献

参考文献7

二级参考文献47

  • 1詹德川,周志华.基于相关投影分的特征选择算法[J].计算机科学与探索,2007,1(2):138-145. 被引量:2
  • 2胡博春,郭淑清,张守伟.一种发动机异响故障诊断系统的实现[J].仪器仪表用户,2005,12(6):43-45. 被引量:3
  • 3袁海英,陈光,谢永乐.故障诊断中基于神经网络的特征提取方法研究[J].仪器仪表学报,2007,28(1):90-94. 被引量:28
  • 4陈彬,洪家荣,王亚东.最优特征子集选择问题[J].计算机学报,1997,20(2):133-138. 被引量:96
  • 5Hand D J. Discrimination and classification [M]. New York.. [s. n. ],1981.
  • 6Jain A K, Zongker D. Feature-selection: Evaluation, application, and small sample performance [J]. IEEE Trans. on Pattern Analysis and Machine Intelligence, 1997,19(2) : 153-158.
  • 7Pudil P, Novovicovd J, Kittler J. Floating search methods in feature selection[J]. Pattern Recognition Letters, 1994,15 : 1119-1125.
  • 8Pudil P, Ferri F J, Novovicova J, et al. Floating search methods for feature selection with nonrnono- tonic criterion funetions[J]. Pattern Recognition, 1994(2) : 279-283.
  • 9Cotter S, Adler J, Rao B, et al. Forward sequential algorithms for best basis selection[J]. Image and Signal Processing, 1999,146:235-244.
  • 10Dimotrios V, Constantine K. Fast and sequential floating forward feature selection with the bayes classifier applied to speech emotion recognition[J]. Signal Processing, 2008,88 (12) : 2956-2970.

共引文献25

同被引文献21

  • 1张丽新,王家廞,赵雁南,杨泽红.基于Relief的组合式特征选择[J].复旦学报(自然科学版),2004,43(5):893-898. 被引量:44
  • 2王卫玲,刘培玉,初建崇.一种改进的基于条件互信息的特征选择算法[J].计算机应用,2007,27(2):433-435. 被引量:23
  • 3VAPNIK V N. Statistical learning theory [ M ]. New York: John Wiley & Sons, 1995.
  • 4SUYKENS J A K, VANDEWALLE J. Least squares sup- port vector machine classifiers [ J ]. Neural Processing Letters, 1999, 9(3): 293-300.
  • 5BRERETON R G, LLOYDA G R. Support vector ma- chines for classification and regression [ J ]. Analyst, 2010, 135(2), 230-267.
  • 6HYVARINEN A. Fast and robust fixed-point algo- rithms for independent component analysis [ J 1 - IEEE Transactions of Neural Networks, 1999, 10 (3) : 626- 634.
  • 7JIAO L CI-I, BO L F, WANG L. Fast sparse approxima- tion for least squares support vector machine [ J ]. IEEE Transaction on Neural Networks, 2007, 18 ( 3 ) : 685 -697.
  • 8XIA X L, JIAO W D, LI K, et al. A novel sparse least squares support vector machines [ J ]. Mathematical Prob- lems in Engineering, 2013: 1-10.
  • 9王磊,刘艳.基于约束Laplacian分值的半监督特征选择算法[J].吉林大学学报(信息科学版),2010,28(4):404-409. 被引量:4
  • 10丁世飞,齐丙娟,谭红艳.支持向量机理论与算法研究综述[J].电子科技大学学报,2011,40(1):2-10. 被引量:925

引证文献2

二级引证文献65

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部