摘要
移动时间层次聚类是一种势能聚类算法,具有较好的聚类效果,但该算法无法识别数据集中存在的噪声数据点。为此,提出一种抗噪的移动时间势能聚类算法。通过各个数据点的势能值以及数据点之间的相似度找到各个数据点的父节点,计算各数据点到父节点的距离,按照该距离以及数据点的势能得到λ值,并依照λ值大小构造递增曲线,通过递增曲线中的拐点来识别出噪声点,将噪声数据归到新的类簇中,对去除噪声点后的数据集,根据数据点与父节点的距离进行层次聚类来获得聚类结果。实验结果表明,该算法能够识别出数据集中的噪声数据点,从而得到更优的聚类效果。
Travel-Time based Hierarchical Clustering(TTHC)is a potential energy clustering algorithm,it has a good clustering effect,but the algorithm cannot identify the noisy data points in the dataset.Therefore,this paper proposes an anti-noise travel-time based potential energy clustering algorithm.The parent node of each data point is found through the values of potential energy of each data point and the similarity between data points,and the distance between each data point and the parent node is calculated.Then according to the distance and the values of potential energy of data points,theλvalue is obtained.An increasing curve is constructed according to theλvalue,and the noise points are identified by finding the inflection points in the increasing curve.The noise data are classified into a new cluster.For the dataset after removing the noise points,the distance between the data point and the parent node is used for hierarchical clustering to obtain the clustering result.Experimental results show that the proposed algorithm can identify the noisy data points in the datasets and thus obtain better clustering effects.
作者
陆慎涛
葛洪伟
LU Shentao;GE Hongwei(Jiangsu Provincial Engineering Laboratory of Pattern Recognition and Computational Intelligence,Jiangnan University,Wuxi,Jiangsu 214122,China;School of Internet of Things Engineering,Jiangnan University,Wuxi,Jiangsu 214122,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第5期144-149,共6页
Computer Engineering
基金
江苏省普通高校研究生科研创新计划项目(KYLX16_0781)
江苏省高校优势学科建设工程项目。
关键词
聚类算法
势能
移动时间层次聚类
噪声识别
数据集
clustering algorthm
potential energy
Travel-Time based Hierarchical Clustering(TTHC)
noise recognition
datasets