时空大数据分布式增量IMSTDCA聚类方法研究

Research on the distributed incremental IMSTDCA clustering method on spatio-temporal big data

下载PDF

导出

摘要时空聚类分析是对时空大数据进行利用的一种有效手段,目前传统聚类算法存在着大规模分布数据难以处理,海量数据处理时间较长,确定参数困难,聚类质量较差等缺陷。因此,提出一种分布式增量聚类流程DICP,利用广域网分布增量聚类方法,避免大量数据的传输拷贝,有效提升聚类运算效率。对于DICP流程中的时空数据聚类算法本身,研究了一种大数据环境下的IMSTDCA时空数据聚类算法,借助密度聚类的思想,通过时空数据的聚集趋势预分析、时空数据聚类算法,以及时空数据聚类结果评价3个步骤完成聚类分析,实现时空大数据的快速高效信息挖掘。 Spatio-temporal clustering analysis is an effective means of using spatio-temporal big data. At present, the traditional clustering algorithm has some disadvantages, for which it＇s difficult to deal with massive data, it takes much time to process massive data, it＇s difficult to confirm the parameters, and the quality of clustering result is low. Therefore, a method, named distributed incremental clustering process （DICP） based on MapReduce is proposed in this paper, which can avoid the transferring and copying of large amounts of data, and greatly improve the efficiency of clustering operation. This paper studies IMSTDCA spatio-temporal data clustering algorithm on big data in DICP. This clustering algorithm makes clustering with the help of density clustering, including three steps, the analysis of gathered trend of spatio-temporal data, the spatio-temporal data clustering algorithm, and the evaluation of spatio-temporal data clustering result. This clustering algorithm can obtain valuable information from spatio-temporal big data in a fast and efficient way.

作者李欣孟德友

机构地区河南财经政法大学中原经济区"三化"协调发展河南省协同创新中心河南财经政法大学资源与环境学院

出处《测绘工程》 CSCD 2017年第11期12-17,共6页 Engineering of Surveying and Mapping

基金国家自然科学基金资助项目(41501178) 河南财经政法大学博士科研启动基金资助项目(800257)

关键词时空数据大数据聚类分析增量聚类时空邻域 spatio-temporal data big data cluster analysis incremental clustering spatio-temporalneighborhood

分类号 K909 [历史地理—人文地理学]

引文网络
相关文献

参考文献4

1李德仁,马军,邵振峰.论时空大数据及其应用[J].卫星应用,2015,0(9):7-11. 被引量：72
2邓敏,刘启亮,王佳,石岩.时空聚类分析的普适性方法[J].中国科学：信息科学,2012,42(1):111-124. 被引量：22
3雷小锋,谢昆青,林帆,夏征义.一种基于K-Means局部最优性的高效聚类算法[J].软件学报,2008,19(7):1683-1692. 被引量：114
4杨杰,李小平,陈湉.基于增量时空轨迹大数据的群体挖掘方法[J].计算机研究与发展,2014,51(S2):76-85. 被引量：9

二级参考文献21

1Han JW, Kamber M. Data Mining: Concepts and Techniques. 2nd ed., San Francisco: Morgan Kaufmann Publishers, 2001. 223-250.
2Ester M, Kriegel HP, Sander J, Xu XW. A density-based algorithm for discovering clusters in large spatial database with noise. In: Simoudis E, Han J, Fayyad UM, eds. Proc. of the 2nd Int'l Conf. on Knowledge Discovery and Data Mining. Portland: AAAI Press, 1996. 226-231.
3Zhang T, Ramakrishnan R, Linvy M. BIRCH: An efficient data clustering method for very large databases. In: Jagadish HV, Mumick IS, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Montreal: ACM Press, 1996. 103-114.
4Guha S, RastogiR, Shim K. CURE: An efficient clustering algorithm for large databases. In: Haas LM, Tiwary A, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. New York: ACM Press, 1998. 73-84.
5Ankerst M, Breuning M, Kriegel HP, Sander J. OPTICS: Ordering points to identify the clustering structure. In: Delis A, Faloutsos C, Ghandeharizadeh S, eds. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Philadelphia: ACM Press, 1999. 49-60.
6Karypis G, Han EH, Kumar V. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. Computer, 1999,32(8): 68-75.
7Hand DJ, Vinciotti V. Choosing k for two-class nearest neighbour classifiers with unbalanced classes. Pattern Recognition Letters, 2003,24(9): 1555-1562.
8Stonebraker M, Frew J, Gardels K, Meredith J. The SEQUOIA 2000 storage benchmark. In: Buneman P, ed. Proc. of the ACM SIGMOD Int'l Conf. on Management of Data. Washington: ACM Press, 1993.2-11.
9Aghabozorgi, Saeed,Saybani, Mahmoud Reza,Wah, Teh Ying.Incremental clustering of time-series by fuzzy clustering[].Journal of Information Science and Engineering.2012
10Lu-An Tang,Yu Zheng,Jing Yuan,Jiawei Han.On Discovery of Traveling Companionsfrom Streaming Trajectories[].ICDE.2012

共引文献209

1吕政阳,邓涛,张丽艳.一种基于机器视觉的飞机钣金件跨粒度识别方法[J].仪器仪表学报,2020,41(2):195-204. 被引量：10
2宋冰,龙毅,张翎,阮陵,葛军莲.旅游时空大数据:概念、分类与应用[J].现代测绘,2020,43(6):14-18. 被引量：1
3王海,高岭,陈东棋,任杰.一种基于用户行为的嵌入式功耗优化方法[J].系统仿真学报,2015,27(2):320-326.
4周慧芳.自适应的k-means聚类算法SA-K-means[J].科技创新导报,2009,6(34):4-5. 被引量：3
5罗晖霞,曲晓玲.基于网络舆情的K-Means算法的改进研究[J].电脑开发与应用,2010,23(8):4-6. 被引量：3
6彭柳青,张军英,许进.基于k-Means均匀效应的健壮聚类初始算法[J].华中科技大学学报（自然科学版）,2010,38(8):73-76. 被引量：2
7李东艳,李绍滋,柯逍.基于外部数据库的图像自动标注改善模型[J].计算机应用,2010,30(10):2610-2613. 被引量：1
8刘琳,于海斌.异构无线传感器网络中簇首的优化部署策略[J].通信学报,2010,31(10):229-237. 被引量：7
9李晓燕,陈刚,寿黎但,董金祥.一种面向协作标签系统的图片检索聚类方法[J].中国图象图形学报,2010,15(11):1635-1643. 被引量：3
10雷小锋,何涛,李奎儒,谢昆青,丁世飞.面向结构稳定性的分裂-合并聚类算法[J].计算机科学,2010,37(11):217-222. 被引量：4

1司福明.一种基于密度的增量k-means聚类算法研究[J].长春工程学院学报（自然科学版）,2016,17(2):99-102. 被引量：5
2潘守慧,王开义,王志彬,韩焱云,赵向宇.基于增量聚类的Web上农产品质量安全突发事件追踪模型[J].情报杂志,2017,36(11):55-58.
3袁伟,石蕾.大数据背景下科技资源信息挖掘与利用的思考[J].中国科技资源导刊,2017,49(6):1-5. 被引量：5
4王笑辰,刘文东,秦圆方,田华,岳明,谈忠鸣,李志锋,鲍倡俊,张云,胡建利,朱凤才.江苏省2011-2016年肾综合征出血热流行特征及时空聚类分析[J].中华疾病控制杂志,2017,21(10):1057-1060. 被引量：31
5何云斌,王霄,万静,李松.障碍空间中基于密度的不确定数据聚类算法[J].小型微型计算机系统,2017,38(12):2772-2776. 被引量：3
6彭梅.大数据环境下的文本信息挖掘方法[J].现代电子技术,2017,40(23):123-126. 被引量：1
7隋广武,刘志锋,朱天柱.精煤产率预分析系统设计和应用[J].能源技术与管理,2017,42(6):200-202.
8刘丙胜.遵循逻辑,提高信息挖掘的有效性——以2017年高考文综全国Ⅰ卷主观题为例[J].教学月刊（中学版）（政治教学）,2017(11):48-50.
9胡文博,黄蔚,胡国超.基于OPTICS聚类和关联分析的轨迹伴随模式分析[J].计算机与现代化,2017(12):82-87. 被引量：4
10孙美兰,宋玉华,唐继海,杜江,聂金桃.巢湖市2009-2015年手足口病时空流行特征分析[J].中华疾病控制杂志,2017,21(8):809-812. 被引量：5

测绘工程

2017年第11期

浏览历史

内容加载中请稍等...

时空大数据分布式增量IMSTDCA聚类方法研究

参考文献4

二级参考文献21

共引文献209

相关作者

相关机构

相关主题

浏览历史