摘要
随着计算机技术的迅速发展,数据来源实现了多元化,传统的单视图聚类算法已不适用于多源异构数据的处理,因而多视图聚类算法成为一个新的研究热点。虽然研究者们已提出多种多视图聚类算法,但是聚类性能的提高仍需要深入的研究与探索。基于多视图聚类的互补原则和共识原则,为提升聚类性能,如何充分提取视图间的异构与全面信息成为多视图聚类研究的关键。论文给出了一种基于多变量自学习与融合策略的多视图聚类算法(MSFC)。该算法首先进行多变量的自学习,对于所有的视图,依据聚类数目和信息熵理论,获取视图内全局变量、视图内局部变量和视图间变量;之后,将全部的变量通过所提相似性度量函数进行融合;最后通过K-means算法取得最终的聚类结果。在多个数据集上进行对比实验,结果验证了该算法具有良好的聚类性能。
With the rapid development of computer technology,data sources are diversified.The traditional single-view clustering algorithm is not suitable for the processing of multi-source heterogeneous data,so the multi-view clustering algorithm has become a new research hotspot.Although researchers have proposed a variety of multi-view clustering algorithms,the improvement of clustering performance still requires in-depth research and exploration.Based on the complementary principle and consensus principle of multi-view clustering,in order to improve the clustering performance,how to fully extract the heterogeneous and comprehensive information between views has become the key to multi-view clustering research.This paper presents a multi-view clustering algorithm based on multivariate self-learning and fusion strategy(MSFC).Firstly,the algorithm performs multi-variable self-learning.For all views,according to the number of clusters and the theory of information entropy,the intra-view global variables,intra-view local variables and inter-view variables are obtained.After that,all the variables are fused through the proposed similarity measure function.Finally,the final clustering result is obtained through the K-means algorithm.Comparative experiments are carried out on multiple datasets,and the results verify that the algorithm has good clustering performance.
作者
尚晓群
杨海峰
蔡江辉
SHANG Xiaoqun;YANG Haifeng;CAI Jianghui(School of Computer Science and Technology,Taiyuan University of Science and Technology,Taiyuan 030024)
出处
《计算机与数字工程》
2022年第6期1229-1232,1285,共5页
Computer & Digital Engineering
基金
国家自然科学基金项目(编号:U1931209)资助。
关键词
多变量自学习与融合策略
信息熵
多视图聚类
multi-variate self-learning and fusion strategies
information entropy
multi-view clustering