摘要
非参数信息理论聚类(NIC)算法通过计算数据点与簇间的互信息来实现聚类,利用无参估计法计算集群平均熵,从而降低人为参与的成本,但该算法假定待分析样本的所有特征对分类具有相同的贡献,与目前已有的研究结果相悖。为此,提出一种特征加权的R-NIC算法,该算法考虑各维特征对模式分类的不同影响,使用Relief F对特征进行加权变换,抑制冗余特征,加强有效特征,利用NIC算法在变换后的特征空间中进行聚类以提高聚类效果。在UCI数据集上的实验结果表明,该算法具有较高的聚类性能,聚类效果优于NIC算法。
Nonparametric Information theoretic Clustering( NIC) utilizes a non-parametric estimation of the average cluster entropies to maximize the estimated mutual information betw een data points and clusters,w hich effectively reduces the cost of participation. How ever,the algorithm assumes that all features of the sample to be analyzed plays a uniform contribution in the process of cluster analysis. Obviously,the hypothesis is inconsistent w ith a lot of practices. Therefore,this paper proposes a novel non-parametric feature w eighting clustering algorithm based on Relief F,w hich is named RNIC,to consider of different feature. It adopts Relief F to transform and w eighting features,R-NIC can inhibit redundant features,improves the clustering results by clustering in the transformed feature space. Experimental results on UCI datasets show that the performance of the proposed R-NIC algorithm is superior to the NIC algorithm.
出处
《计算机工程》
CAS
CSCD
北大核心
2015年第4期161-165,共5页
Computer Engineering
基金
国家自然科学基金资助项目"多变量IB方法及算法的研究"(61170223)
国家自然科学基金联合基金资助项目"可扩展迁移学习中跨媒体复杂问题自动映射研究"(U1204610)
关键词
非监督
聚类
互信息
非参数信息理论聚类算法
准确率
特征加权
unsupervised
clustering
mutual information
Nonparametric Information theoretic Clustering(NIC) algorithm
accuracy
feature w eighting