期刊文献+

Hadoop平台下基于优化X-means算法的大数据聚类研究

Research on Big Data Clustering Based on Optimized X-means Algorithm Under Hadoop Platform
下载PDF
导出
摘要 针对现有聚类方法对数据处理规模的局限性,解决数据聚类效果差的问题,在Hadoop平台的支持下提出基于优化X-means算法的大数据聚类方法;利用Hadoop平台架构与函数采集大数据样本,通过缺失补偿、噪声滤波、归一化等步骤,实现初始样本数据的预处理;选择大数据聚类中心,分别提取聚类中心数据与其他所有数据样本的特征,计算数据样本与聚类中心之间的特征相似度;以相似度度量结果为聚类判定条件,利用优化X-means算法确定数据所属类型,最终实现大数据的聚类处理工作;通过聚类效果测试实验得出结论:在有、无两种实验条件下,与传统聚类方法相比,优化设计方法的查全率和查准率分别提升了4.75%和4.5%,同时优化聚类方法得出数据具有更高利用率。 In response to the limitations of existing clustering methods on data processing scale and poor performance of solving data clustering,a big data clustering method based on optimized X-means algorithm is proposed with the support of Hadoop platform.The Hadoop platform architecture and functions are used to collect the big data samples,and implement the preprocessing of the ini-tial sample data is through the steps such as missing compensation,noise filtering,and normalization.The big data clustering center is selected to extract the features of the clustering center data and all other data samples respectively,and calculate the feature simi-larity between the data samples and the clustering center.Using similarity measurement results as the clustering criteria,the opti-mized X-means algorithm is used to determine the type of data,ultimately achieving the processing of big data clustering.Through the testing experiments of clustering effectiveness,it is concluded that compared to traditional clustering methods with or without two ex-perimental conditions,the recall and precision of the optimized design method are improved by 4.75%and 4.5%respectively.at the same time,the optimized clustering method has higher data utilization rate.
作者 张鹏飞 江岸 熊念 ZHANG Pengfei;JIANG An;XIONG Nian(School of Computer,Guangdong Agriculture Industry Business Polytechnic,Guangzhou 510507,China;School of Information Science and Technology,Jinan University,Guangzhou 510632,China)
出处 《计算机测量与控制》 2023年第12期284-289,309,共7页 Computer Measurement &Control
基金 广东省普通高校重点领域专项(新一代信息技术)课题(2023ZDZX1068,2021ZDZX1138)。
关键词 HADOOP平台 优化X-means算法 大数据聚类 Hadoop platform optimize X-means algorithm big data cluster
  • 相关文献

参考文献20

二级参考文献138

共引文献119

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部