摘要
聚类是一种无监督学习方法,它通过对样本特征分析度量数据间的相似性和差异性,利用簇内相似性高、簇间差异性大的特性对数据进行自动化分组,被广泛应用于计算机视觉、文本挖掘、生物信息等领域。聚类算法在鲁棒性、普适性、类数选择等方面存在提升空间,且算法的效果很大程度上受到数据集密度和流形的影响。提出了基于局部结构自表达的鲁棒演化聚类算法,该算法通过使用径向基函数并加入先验信息获取数据的局部密度差异特征,构建全新的相似性度量,在此过程融入了数据局部结构特征的提取机制和稳定类的识别机制,使聚类具有更好的鲁棒性和普适性。动态演化聚类在这两个方面有着天然的优势,可在动态的聚类过程中持续优化聚类结果,使得聚类效果得到了很大改进。新算法通过数据集结构信息自表达对局部和整体特征进行信息融合,同时在动态的演化过程中监控类的稳定性,从而得到更好的聚类结果。在人工数据集和真实数据集上的实验结果表明,新算法的聚类性能更优越。
Clustering is an unsupervised learning method that measures the similarity and difference between data by analyzing sample features.It utilizes the characteristics of high intra cluster similarity and large inter cluster differences to automate the process of grouping data.It is widely used in fields such as computer vision,text mining,biological information and so on.There is still improvement room in clustering algorithms in terms of robustness,universality,and class number selection,and the effectiveness of the algorithms is largely influenced by the density and manifold of the dataset.This paper proposes a robust evolutionary clustering algorithm based on local structure self-expression.This algorithm uses radial basis functions and adds prior information to obtain local density difference features of the data,constructing a new similarity measure.In this process,the extraction mechanism of local structural features of data and the recognition mechanism of stable classes are integrated,making clustering more robust and universal.Dynamic evolutionary clustering has natural advantages in these two aspects,which can continuously optimize clustering results during the dynamic clustering process,resulting in significant improvements in clustering performance.The new algorithm integrates local and global features through self-expression of the structure information in the dataset,while monitoring the stability of the class during dynamic evolution,in order to obtain better final clustering results.The experimental results on both synthetic and real datasets demonstrate that the clustering performance of the new algorithm is superior.
作者
李春忠
鞠文亮
靖凯立
桂扬
LI Chunzhong;JU Wenliang;JING Kaili;GUI Yang(School of Statistics and Applied Mathematics,Anhui University of Finance and Economics,Bengbu 233000;School of Mathematics and Statistics,Xi’an Jiaotong University,Xi’an 710049;School of Mathematics and Physics,University of Science and Technology Beijing,Beijing 100083)
出处
《工程数学学报》
CSCD
北大核心
2024年第6期1006-1020,共15页
Chinese Journal of Engineering Mathematics
基金
安徽省高校自然科学基金(KJ2021A0481,KJ2021A0473).
关键词
聚类
相似性度量
相对局部密度
最近邻
自表达
clustering
similarity measurement
relative local density
nearest neighbor
selfexpression