摘要
聚类分析是数据挖掘与知识发现领域的一个重要研究方向.多数聚类算法中相似性是其核心概念之一,对象之间的相似性会被直接或者间接的计算出来.传统的相似性度量方法多是基于单一的粒度去观察两个被测对象.在人类认知过程中,通常采用多粒度来更合理有效地进行问题求解.本文借鉴人类的这种多粒度认知机理,提出一种新的相似性学习方法,称作全粒度相似性度量方法,基于此发展了一种全粒度聚类算法.而全粒度相似性度量从各个角度观察被测对象,进而会得到两个对象间更加真实的相似度.从UCI数据集中选取5组数据进行实验,最后通过与两种传统的聚类方法比较验证了全粒度聚类算法的合理性与有效性.
In cluster analysis,especially cluster in an optimization process,one of the decisive factors is the similarity measure employed in the clustering criterion function.By far,all proposed cluster methods have to assume connection among the information objects that applied on.Similarity between every pair objects should be computed,there are two choices which defined as explicitly or implicitly.Hence weather the structure of data can be described by the similarity measure correctly determines the effectiveness of a clustering algorithm.In addition,as one of important characters in human's cognition,multi-granulation cognition plays a key role for data modeling.On account of from multiperspective and multi-level to parse one problem,multi-granulation analysis can obtain more reasonable and more satisfied solutions.Through referencing human's multi-granulation cognitive ability,in this paper,we introduced a novel similarity measure called whole-granulation similarity measure and apply this similarity measure into clustering criterion function to get a cluster algorithm called whole-granulation cluster algorithm in order to verify the rationalization of whole-granulation similarity measure.The traditional dissimilarity/similarity measure exercise only one single viewpoints,usually is the origin.More informative assessment of similarity could be achieved because whole-granulation takes all sides into consideration.As a leading partitional clustering technique,k-means is one of the most favorite algorithms to be used,because k-means is fast and easy to combine with other methods.Many research putforward the k-means through improve the heuristic function or combine with other method.This is an active aspect to do clustering research.Under this approach we introduce our measure method into cluster analysis through kmeans algorithm as an initial testing.Experiments are conducted with five data sets are selected from UCI machine learning repository.Finally,compared whole-granulation cluster algorithm with two traditional cluster algorithms to verity the validity and proved the rationality of whole-granulation similarity measure at the same time.And the astringency experiment show that whole-granulation similarity measure have a strong performance as a way to measure similarity.
出处
《南京大学学报(自然科学版)》
CAS
CSCD
北大核心
2014年第4期505-516,共12页
Journal of Nanjing University(Natural Science)
基金
高等学校博士学科点专项科研基金(20121401110013)
新世纪优秀人才支持计划(NCET-12-1031)
关键词
相似性度量
聚类分析
全粒度
similarity measure
cluster
whole-granulation