摘要
多视图聚类是近年来图数据挖掘领域的研究热点。由于数据采集技术的限制或人为因素等原因常导致视图或样本缺失问题。降低多视图的不完整性对聚类效果的影响是多视图聚类目前面临的重大挑战。因此,综合研究不完整多视图聚类(IMC)近年的发展具有重要的理论意义和实践价值。首先,归纳分析不完整多视图数据缺失类型;其次,详细比较基于多核学习(MKL)、矩阵分解(MF)学习、深度学习和图学习这4类IMC方法,分析代表性方法的技术特点和区别;再次,从数据集类型、视图和类别数量、应用领域等角度总结22个公开不完整多视图数据集;继次,总结评价指标,并系统分析现有不完整多视图聚类方法在同构和异构数据集上的性能表现;最后,归纳分析不完整多视图聚类目前存在的问题、未来的发展方向和现有应用领域。
Multi-view clustering has recently been a hot topic in graph data mining.However,due to the limitations of data collection technology or human factors,multi-view data often has the problem of missing views or samples.Reducing the impact of incomplete views on clustering performance is a major challenge currently faced by multi-view clustering.In order to better understand the development of Incomplete Multi-view Clustering(IMC)in recent years,a comprehensive review is of great theoretical significance and practical value.Firstly,the missing types of incomplete multi-view data were summarized and analyzed.Secondly,four types of IMC methods,based on Multiple Kernel Learning(MKL),Matrix Factorization(MF)learning,deep learning,and graph learning were compared,and the technical characteristics and differences among the methods were analyzed.Thirdly,from the perspectives of dataset types,the numbers of views and categories,and application fields,twenty-two public incomplete multi-view datasets were summarized.Then,the evaluation metrics were outlined,and the performance of existing incomplete multi-view clustering methods on homogeneous and heterogeneous datasets were evaluated.Finally,the existing problems,future research directions,and existing application fields of incomplete multi-view clustering were discussed.
作者
董瑶
付怡雪
董永峰
史进
陈晨
DONG Yao;FU Yixue;DONG Yongfeng;SHI Jin;CHEN Chen(School of Artificial Intelligence,Hebei University of Technology,Tianjin 300401,China;Hebei Province Key Laboratory of Big Data Computing(Hebei University of Technology),Tianjin 300401,China;Hebei Engineering Research Center of Data-Driven Industrial Intelligence(Hebei University of Technology),Tianjin 300401,China)
出处
《计算机应用》
CSCD
北大核心
2024年第6期1673-1682,共10页
journal of Computer Applications
基金
河北省高等学校科学技术研究项目(QN2021213,ZD2022082)
河北省高等教育教学改革研究与实践项目(2020GJJG027,2022GJJG049)。
关键词
不完整性
多视图聚类
图数据挖掘
缺失视图
多视图学习
incompleteness
multi-view clustering
graph data mining
missing view
multi-view learning