摘要
针对传统的基于密度的不确定性聚类算法存在参数敏感和对复杂流形不确定数据集得到聚类结果较差的缺陷,提出一种新的基于JS散度的不确定数据密度峰值聚类算法(UDPC-JS)。该算法首先用不确定自然邻居定义的不确定自然邻域密度因子去除噪声点;其次,通过不确定自然邻居和JS散度相结合的方式计算不确定数据对象的局部密度,通过结合代表点的思想找到不确定数据集的初始聚类中心,并在初始聚类中心之间定义基于JS散度和图的距离;然后,再利用基于不确定自然邻居和JS散度计算出的局部密度和在初始聚类中心之间新定义的基于JS散度和图的距离在初始聚类中心上构建决策图,并根据决策图选择最终的聚类中心;最后,将未分配的不确定数据对象分配到其初始聚类中心所在的簇中。实验结果表明:该算法较对比算法具有更好的聚类效果和准确性,并且在处理复杂流形的不确定数据集上的优势较大。
Aiming at the defects of traditional density-based uncertain clustering algorithm,such as parameter sensitivity and poor clustering results for complex manifold uncertain data sets,a new uncertain data density peak clustering algorithm based on JS divergence(UDPC-JS)was proposed.The algorithm first uses the uncertain natural neighborhood density factor defined by uncertain natural neighbors to remove noise points;secondly,the local density of uncertain data objects is calculated by combining uncertain natural neighbors and JS divergence.Then,the initial clustering center of uncertain data sets is found by combining the idea of representative points,and the distance based on JS divergence and graph is defined between the initial clustering centers.Then,the local density calculated based on uncertain natural neighbors and JS divergence and the newly defined distance based on JS divergence and graph between the initial clustering centers are used to construct the decision graph on the initial clustering center,and the final clustering center is selected according to the decision graph.Finally,the unassigned uncertain data objects are assigned to the cluster where their initial clustering centers are located.The experimental results show that the algorithm has better clustering effect and accuracy than the comparison algorithm and has greater advantages in dealing with uncertain data sets of complex manifolds.
作者
李松
刘晓楠
刘娟
LI Song;LIU Xiao-nan;LIU Juan(School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China;Strategic Research,Qi An Xin Technology Group Inc,Beijing 100088,China)
出处
《吉林大学学报(工学版)》
EI
CAS
CSCD
北大核心
2024年第7期2038-2048,共11页
Journal of Jilin University:Engineering and Technology Edition
基金
国家自然科学基金项目(62072136)
黑龙江省重点研发计划项目(2022ZX01A34)
国家重点研发计划项目(2020YFB1710200).
关键词
不确定数据
不确定自然邻居
JS散度
密度峰
聚类
uncertain data
uncertain natural neighbors
JS divergence
density peak
clustering