摘要
Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study’s experiments show the good performance of the new method.
Feature subset selection is a fundamental problem of data mining. The mutual information of feature subset is a measure for feature subset containing class feature information. A hashing mechanism is proposed to calculate the mutual information of feature subset. The feature relevancy is defined by mutual information. Redundancy-synergy coefficient, a novel redundancy and synergy measure for features to describe the class feature, is defined. In terms of information maximization rule, a bidirectional heuristic feature subset selection method based on mutual information and redundancy-synergy coefficient is presented. This study' s experiments show the good performance of the new method.
基金
SponsoredbytheNationalNatureScienceFoundationofChina(GrantNo.60075007).
关键词
交互信息
特征选择
模式分类
数据挖掘
mutual information
feature selection
pattern classification
data mining