摘要
特征权重算法对聚类效果有很大的影响,而传统的特征权重算法忽略了特征项在类间和类内的分布情况.因此,研究聚类后样本特征属性表现的有序性程度对聚类结果的影响,分析聚类后样本特征属性的分布情况,提出了一种自适应特征熵权模糊C均值聚类算法.该算法以聚类后的特征熵和信息增益作为准则调整特征权值,通过聚类与权重更新逐步迭代优化,直至获得最优的特征权值.实验表明,自适应特征熵权模糊C均值聚类算法能够有效地区分各个特征属性对聚类效果的重要程度;较于其它加权模糊C均值聚类算法,该算法能够得到更高的聚类准确率.
Feature weight algorithm has great impact on the classification results. Traditional algorithms didn't consider distribution information among and inside classes. Therefore, study the impact of order- ing degree of feature attributes after clustering, and analyse the distribution of feature attributes, named as adaptive feature entropy weight fuzzy C-means clustering algorithm (AEWFCM), is proposed. Both the clustering features entropy and the information gain are the criteria to adjust feature weights. By clustering iterative optimization weight gradually and continuously updated until the best feature weights obtained. Experimental results show that the AEWFCM algorithm can effectively distinguish the features attributes on the importance of clustering results; and compared with other famous fuzzy C-means clustering algorithms, it can get a higher accuracy in clustering with the same sample.
出处
《系统工程理论与实践》
EI
CSSCI
CSCD
北大核心
2016年第1期219-223,共5页
Systems Engineering-Theory & Practice
基金
国家自然科学基金(6123307)~~
关键词
模糊C均值聚类
自适应
特征权重
熵
fuzzy C-means clustering
adaptive
feature weighting
entropy