期刊文献+

基于文化基因算法和犹豫模糊集的聚类算法及其分布并行实现 被引量:2

CLUSTERING ALGORITHM BASED ON MEMETIC ALGORITHM AND HESITANT FUZZY SETS AND ITS DISTRIBUTED PARALLEL IMPLEMENTATION
下载PDF
导出
摘要 为了提高海量高维小样本数据的聚类准确率和效率,提出一种基于递归文化基因和云计算分布式计算的高维大数据聚类系统。基于Spark分布式计算平台设计迭代的聚类系统,分为基于递归文化基因的特征归简处理和基于密度的聚类处理。前者将基因微阵列的聚类准确率结果作为主目标,特征数量作为次目标,递归地化简特征空间;后者基于犹豫模糊集理论设计基于密度的聚类算法,采用加权的犹豫模糊集相关系数度量数据之间的距离。基于人工合成数据集和临床实验数据集均进行仿真实验,结果表明该算法在聚类准确率、扩展性和时间效率上均实现了较好的效果。 In order to improve the clustering accuracy and efficiency of massive high dimensional small sample size datasets,this paper proposes a high dimensional big data clustering system based on recursive memetic algorithm and cloud distributed computing.We designed a iterative clustering system based on Spark distributed computing platform,and the system consisted of recursive memetic-based feature reduction and density-based clustering.The former treated the clustering accuracy results of gene microarrays as major objective,and treated feature number as secondary objective,it reduced the feature space recursively;the latter designed the density based clustering algorithm based on the hesitant fuzzy set theory,adopted weighted hesitant fuzzy set correlation coefficient to measure the distances between data points.Simulation experiments were done based on both synthetic datasets and clinical datasets,experimental results indicate that the proposed algorithm realizes good results in clustering accuracy,scalability and time efficiency.
作者 王超英 Wang Chaoying(Dongguan Polytechnic,Dongguan 523808,Guangdong,China)
出处 《计算机应用与软件》 北大核心 2021年第4期295-304,共10页 Computer Applications and Software
关键词 大数据分析 高维小样本数据 文化基因算法 分布式计算 犹豫模糊集 Big data analysis High dimensional small sample size data Memetic algorithm Distributed computing Hesitant fuzzy set
  • 相关文献

参考文献10

二级参考文献83

共引文献154

同被引文献13

引证文献2

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部