摘要
时空聚类分析是对时空大数据进行利用的一种有效手段,目前传统聚类算法存在着大规模分布数据难以处理,海量数据处理时间较长,确定参数困难,聚类质量较差等缺陷。因此,提出一种分布式增量聚类流程DICP,利用广域网分布增量聚类方法,避免大量数据的传输拷贝,有效提升聚类运算效率。对于DICP流程中的时空数据聚类算法本身,研究了一种大数据环境下的IMSTDCA时空数据聚类算法,借助密度聚类的思想,通过时空数据的聚集趋势预分析、时空数据聚类算法,以及时空数据聚类结果评价3个步骤完成聚类分析,实现时空大数据的快速高效信息挖掘。
Spatio-temporal clustering analysis is an effective means of using spatio-temporal big data. At present, the traditional clustering algorithm has some disadvantages, for which it's difficult to deal with massive data, it takes much time to process massive data, it's difficult to confirm the parameters, and the quality of clustering result is low. Therefore, a method, named distributed incremental clustering process (DICP) based on MapReduce is proposed in this paper, which can avoid the transferring and copying of large amounts of data, and greatly improve the efficiency of clustering operation. This paper studies IMSTDCA spatio-temporal data clustering algorithm on big data in DICP. This clustering algorithm makes clustering with the help of density clustering, including three steps, the analysis of gathered trend of spatio-temporal data, the spatio-temporal data clustering algorithm, and the evaluation of spatio-temporal data clustering result. This clustering algorithm can obtain valuable information from spatio-temporal big data in a fast and efficient way.
出处
《测绘工程》
CSCD
2017年第11期12-17,共6页
Engineering of Surveying and Mapping
基金
国家自然科学基金资助项目(41501178)
河南财经政法大学博士科研启动基金资助项目(800257)
关键词
时空数据
大数据
聚类分析
增量聚类
时空邻域
spatio-temporal data
big data
cluster analysis
incremental clustering
spatio-temporalneighborhood