期刊文献+

基于动态分布式聚类算法的大数据查询处理方法 被引量:14

Big Data Query Processing Method Based on Dynamic Distributed Clustering Algorithm
下载PDF
导出
摘要 针对现有大数据空间查询处理方法存在执行时间长和查询结果不够准确的问题,提出一种基于动态分布式聚类算法的大数据查询处理方法,该方法分为数据预处理、数据聚类和查询处理3个部分.首先将输入数据划分为多个子集,以RRD格式存储在一组机器节点中;其次采用划分和层次混合动态聚类算法,在Apache Spark平台上对数据进行分布式聚类;最后通过K近邻查询方式获得高精度和高效率查询结果.实验结果表明,本文提出的方法具有可扩展性,可为空间查询处理提供高质量的结果,比其他查询方法更具优势. Aiming at the problems of long execution time and inaccurate query results in existing big data spatial query processing methods,a big data query processing method based on dynamic distributed clustering algorithm has been proposed,which includes data pre-processing,data clustering and query processing.Firstly,the method divides the input data into multiple subsets and stores them in a group of machine nodes in RRD format.Secondly,the partition and hierarchical hybrid dynamic clustering algorithm is used to cluster the data on Apache spark platform.And lastly,the high-precision and high-efficiency query results are obtained by K-Nearest Neighbor query.The experimental results show that the proposed method is scalable,and provides high quality results for spatial query processing,which has more advantages than other query methods.
作者 唐运乐 韦杏琼 TANG Yun-le;WEI Xing-qiong(School of Electromechanical and Information Engineering, Guangxi Vocational &Technical College, Nanning 530226, China;School of Information Science and Engineering, Guangxi University for Nationalities, Nanning 530006, China)
出处 《西南师范大学学报(自然科学版)》 CAS 2021年第5期134-139,共6页 Journal of Southwest China Normal University(Natural Science Edition)
基金 广西教育厅自然科学基金项目(2019KY1220).
关键词 大数据 动态分布式聚类 查询处理 Apache Spark big data dynamic distributed clustering query processing Apache Spark
  • 相关文献

参考文献4

二级参考文献7

共引文献9

同被引文献150

引证文献14

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部