期刊文献+

时空大数据分布式增量IMSTDCA聚类方法研究

Research on the distributed incremental IMSTDCA clustering method on spatio-temporal big data
下载PDF
导出
摘要 时空聚类分析是对时空大数据进行利用的一种有效手段,目前传统聚类算法存在着大规模分布数据难以处理,海量数据处理时间较长,确定参数困难,聚类质量较差等缺陷。因此,提出一种分布式增量聚类流程DICP,利用广域网分布增量聚类方法,避免大量数据的传输拷贝,有效提升聚类运算效率。对于DICP流程中的时空数据聚类算法本身,研究了一种大数据环境下的IMSTDCA时空数据聚类算法,借助密度聚类的思想,通过时空数据的聚集趋势预分析、时空数据聚类算法,以及时空数据聚类结果评价3个步骤完成聚类分析,实现时空大数据的快速高效信息挖掘。 Spatio-temporal clustering analysis is an effective means of using spatio-temporal big data. At present, the traditional clustering algorithm has some disadvantages, for which it's difficult to deal with massive data, it takes much time to process massive data, it's difficult to confirm the parameters, and the quality of clustering result is low. Therefore, a method, named distributed incremental clustering process (DICP) based on MapReduce is proposed in this paper, which can avoid the transferring and copying of large amounts of data, and greatly improve the efficiency of clustering operation. This paper studies IMSTDCA spatio-temporal data clustering algorithm on big data in DICP. This clustering algorithm makes clustering with the help of density clustering, including three steps, the analysis of gathered trend of spatio-temporal data, the spatio-temporal data clustering algorithm, and the evaluation of spatio-temporal data clustering result. This clustering algorithm can obtain valuable information from spatio-temporal big data in a fast and efficient way.
作者 李欣 孟德友
出处 《测绘工程》 CSCD 2017年第11期12-17,共6页 Engineering of Surveying and Mapping
基金 国家自然科学基金资助项目(41501178) 河南财经政法大学博士科研启动基金资助项目(800257)
关键词 时空数据 大数据 聚类分析 增量聚类 时空邻域 spatio-temporal data big data cluster analysis incremental clustering spatio-temporalneighborhood
  • 相关文献

参考文献4

二级参考文献21

  • 1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250.
  • 2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231.
  • 3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114.
  • 4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84.
  • 5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60.
  • 6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75.
  • 7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562.
  • 8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11.
  • 9Aghabozorgi, Saeed,Saybani, Mahmoud Reza,Wah, Teh Ying.Incremental clustering of time-series by fuzzy clustering[].Journal of Information Science and Engineering.2012
  • 10Lu-An Tang,Yu Zheng,Jing Yuan,Jiawei Han.On Discovery of Traveling Companionsfrom Streaming Trajectories[].ICDE.2012

共引文献209

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部