摘要
针对数据集中属性间存在依赖关系以及对象间存在相关性,定义了一种新的相似关系模型,该模型所描述的相似关系能够体现对象之间的自然相关性.在此基础上提出一种基于属性依赖关系和对象相关性的自然聚类算法,该聚类算法在不事先指定聚类数目的情况下,将所有相似性达到设定阈值的对象自然聚为一类;当调整相似性阈值时,该算法还可实现不同粒度的聚类.通过分别对数值型数据集和分类型数据集进行实验比较分析,结果表明这种自然聚类算法与其他聚类算法相比,能够真实反映数据间的相关性以及数据集的自然簇结构,同时可以发现任意形状的簇,有效地提高了聚类的精度和质量.
In this paper, taking into account that there exists attribute dependency and object correlation of the data sets, we proposed a novel similarity relation model in which the similarity relation is able to reflect the natural relationship between the objects. And from this we presented a natural clustering algorithm based on attributes dependency and objects correlation. It is able to group the data ob- jects into different cluster automatically under the similarity threshold without specifying the number of clusters at the beginning. Addi- tionally by adapting the similarity threshold the algorithm can group the objects into clusters on different granularity. Experimental re- suits show that comparing to other clustering algorithms it can better identify the natural cluster structure of data objects with the exper- iments on the numeric data sets and on the category data sets. Meanwhile ,it can also discover clusters of arbitrary shape. In tests of the algorithm we find that it has obvious advantages in accuracy and quality.
出处
《小型微型计算机系统》
CSCD
北大核心
2015年第4期810-814,共5页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61272109)资助
关键词
属性依赖关系
对象相关性
相似度
目标函数
自然聚类
attribute dependency
object correlation
similarity
object function
natural clustering