摘要
借鉴邻域粗糙集处理连续型数据的优势,为解决传统谱聚类算法需要人工选取参数的问题,提出基于自适应邻域互信息与谱聚类的特征选择算法。首先,定义各对象在属性下的标准差集合与自适应邻域集,给出自适应邻域熵、平均邻域熵、联合熵、邻域条件熵、邻域互信息等不确定性度量,利用自适应邻域互信息对特征与标签的相关性进行排序。然后,结合共享近邻自适应谱聚类算法,将相关性强的特征聚到同一特征簇内,使不同特征簇内的特征强相异。最后,使用最小冗余最大相关技术设计特征选择算法。在10个数据集上选择特征个数与分类精度的实验结果,验证了所提算法的有效性。
In order to deal with the problem that traditional spectral clustering algorithms need set parameters manually, this paper proposes a feature selection algorithm based on adaptive neighborhood mutual information and spectral clustering, which takes the advantage of neighborhood rough sets to deal with continuous data. First, the standard deviation set and adaptive neighborhood set of each object on attribute are defined. Some uncertainty measures such as adaptive neighborhood entropy, average neighborhood entropy, joint entropy, neighborhood conditional entropy and neighborhood mutual information are given, and then the adaptive neighborhood mutual information is used to sort the correlation between features and labels. Second, the shared nearest neighbor spectral clustering algorithm is combined to cluster the strongly relevant features into the same feature cluster, so that the features in the different feature clusters are strongly diverse. Finally, the feature selection algorithm is designed by employing the minimum redundancy and maximum correlation technology. The experimental results of selecting the number of features and classification accuracy on ten datasets verify the effectiveness of the proposed algorithm.
作者
孙林
梁娜
徐久成
SUN Lin;LIANG Na;XU Jiu-cheng(College of Computer and Information Engineering,Henan Normal University,Xinxiang 453007,Henan,China;Henan Engineering Laboratory of Smart Business and Internet of Things Technology,Xinxiang 453007,Henan,China)
出处
《山东大学学报(理学版)》
CAS
CSCD
北大核心
2022年第12期13-24,共12页
Journal of Shandong University(Natural Science)
基金
国家自然科学基金资助项目(62076089,61976082)
河南省科技攻关资助项目(212102210136)。
关键词
特征选择
邻域粗糙集
邻域互信息
谱聚类
最小冗余最大相关
feature selection
neighborhood rough set
adaptive neighborhood mutual information
spectral clustering
minimum redundancy and maximum correlation