期刊文献+

基于K-medoids聚类算法的多源信息数据集成算法 被引量:4

Multi-source Information Data Integration Algorithm Based on K-Medoids Clustering Algorithm
下载PDF
导出
摘要 针对因多源信息数据源域相似性较低、不易确定导致的集成难度较大问题,提出一种基于K-medoids聚类算法的集成方法.先将多源数据的聚类过程视为迁移学习过程,确定初始样本的权重值,记录训练样本每次迭代时权重和损失期望值的学习特点,再利用特点参数判定数据属于源域还是目标域;然后将集成算法聚类转化为多样化的域值标记问题,使数据具有聚类特性后,再分别计算源域和目标域中待集成数据间的权重因子,利用权重因子覆盖特性判定二者间的交互信息量,对信息量较高的数据进行集成,以确保集成的成功率.仿真实验结果表明,该算法无论是在稳定、数目较少的数据集,还是在紊乱、数目较多较杂的数据集下,都能实现高效集成,并且二次集成次数较少,整体耗用较低. Aiming at the problem that the integration difficulty was relatively high caused by the low similarity and uncertainty of multi-source information data source domain,we proposed an integration method based on K-medoids clustering algorithm.First,the clustering process of multi-source data was regarded as a transfer learning process,the weight value of the initial sample was determined,the learning characteristics of the weight and loss expectation value of the training sample in each iteration were recorded,and then the characteristic parameters were used to determine whether the data belongs to the source domain or the target domain.Then the clustering of the integration algorithm was transformed into a diversified domain value marking problem.After the data had the clustering characteristics,the weight factors between the data to be integrated in the source domain and the target domain were calculated respectively,the amount of interactive information between them was determined by using the coverage characteristics of the weight factors,and the data with high amount of information was integrated to ensure the success rate of integration.The simulation experiment results show that the proposed algorithm can achieve efficient integration,less secondary integration times and low overall consumption under stable and less datasets,or disordered and more and more complex datasets.
作者 祝鹏 郭艳光 ZHU Peng;GUO Yanguang(Department of Computer Technology and Information Management,Inner Mongolia Agricultural University,Baotou 014109,Inner Mongolia Autonomous Region,China)
出处 《吉林大学学报(理学版)》 CAS 北大核心 2023年第3期665-670,共6页 Journal of Jilin University:Science Edition
基金 内蒙古自治区科技重大专项课题项目(批准号:2021SZD0012-1) 内蒙古自治区科技计划项目(批准号:2020GG0033) 内蒙古自治区高等学校科学研究项目(批准号:NJZY20055) 内蒙古自治区哲学社会科学规划项目(批准号:2020NDC067).
关键词 K-medoids聚类算法 多源数据 源域 目标域 交互信息量 K-medoids clustering algorithm multi-source data source domain target domain amount of interactive information
  • 相关文献

参考文献16

二级参考文献137

共引文献164

同被引文献43

引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部