摘要
随着移动定位技术的蓬勃发展和移动定位设备的广泛应用,衍生了海量移动对象的位置信息.该类位置信息包含地理坐标、速度、方向以及时间戳等信息,被实时采集且持续增加,形成了大规模高速的分布式轨迹数据流.及时、有效地对分布式的轨迹流数据进行在线聚类分析,可以实时获取移动对象的共同移动趋势.由于轨迹数据流固有的海量、高速、偏态分布、时变进化且存在概念漂移的特性以及在线聚类的严格时空开销需求,基于静态轨迹数据的聚类方法不能直接应用于分布式轨迹流的在线聚类分析.分布式的轨迹流聚类研究面临巨大挑战,研究工作仍处于初期探索阶段.为解决上述问题,面对地理上分散采集的轨迹流数据,亟需设计高效的并行聚类分析任务及确保传输开销最小化的通信机制来满足低处理延迟的实时聚类需求.该文首先设计了分布式聚类概要数据结构以实时获取相似轨迹簇的时空特征,继而维护持续进化的分布式轨迹数据流.在此基础上,以减少通信开销提高分布式轨迹流聚类效率为目标,提出了一个在线处理分布式轨迹数据流的增量聚类算法(OCluDTS).OCluDTS方法使用基于滑动窗口模型的两层分布式框架,通过多个远程节点并行聚类局部轨迹流以及协调者节点合并局部聚类结果的方式,确保分布式轨迹流聚类获得与集中式方法相同的精度.此外,为了进一步降低OCluDTS算法的总执行开销,提出了仅限于聚类更新的远程节点传输聚类结果给协调者节点以及基于协调者节点相似性计算的剪枝策略等优化措施.最后,理论分析以及基于真实轨迹数据集的实验结果验证了OCluDTS算法处理大规模分布式轨迹流数据时的有效性和高效性.
With the widespread application of modern mobile devices and the vigorous development of location acquisition technologies,numerous moving objects have been relaying their locations continuously.Such the location information involves the geographic coordinates,speed,direction,as well as timestamps,etc.They are collected in real time and the amount of data continues to increase,and hence a tremendous amount of distributed trajectory data streams are generated.This necessitates to conduct the analysis task like clustering upon the distributed trajectory data streams timely and effectively to gain insights about the common moving trends of the moving objects.Due to the massive volume,high velocity,skewness distribution,time-varying evolution property of the streaming trajectories,the existing trajectory clustering techniques on the static trajectory data set cannot be directly applied to the distributed trajectory data streams.In addition,the key issue also comes from the strict space-and time-complexities of processing the continuously arrived trajectory data,combined with the concept drift that emerged in the streaming cases.Therefore,the study of clustering upon the distributed trajectory streams have faced the huge challenges,which makes this topic is still in its early stage of exploration.To address the above mentioned issues,it is imperative to develop the efficient parallel clustering analysis tasks,and exploit the effective communication mechanisms that minimize the transmission overhead to meet the real-time clustering requirement with low processing latency in processing the trajectory streams that collected in geographically dispersed manner.In this paper,we firstly present a novel distributed synopsis structure to extract the clustering characteristic of the trajectory cluster and further keep the track of the evolving distributed trajectory data streams.On the basis of that,with the aim of reducing the communication overhead and improving the clustering performance upon the distributed trajectory data streams,we develop an incremental algorithm for online clustering upon the distributed trajectory streams(called OCluDTS).OCluDTS algorithm leverages the sliding window model and consists of two phases.At the first phase(called as the local clustering phase),the most recent arrived sets of trajectories at each time instant in current window for all the remote site are conducted clustering in parallel to obtain the local clustering results.At the second phase(called as the local clustering results merging phase),the local clustering results of all the remote sites are transferred to the coordinator to participate in the re-clustering process to derive the final clustering results.This two-phase clustering mechanism could guarantee achieving the same high clustering precision as the centralized solution.Moreover,the pruning mechanism of similarity calculation,and the optimization strategy that testing first and transferring later enable OCluDTS algorithm to further boost the efficiency.Finally,we conduct extensive experiments on the real data set to evaluate the effectiveness and efficiency of OCluDTS through comparing it with the centralized algorithm.Theoretical analysis and experimental results show that OCluDTS can achieve the superior performance in clustering on the massive amounts of streaming trajectories.
作者
毛嘉莉
陈鹤
宋秋革
金澈清
周傲英
MAO Jia-Li;CHEN He;SONG Qiu-Ge;JIN Che-Qing;ZHOU Ao-Ying(School of Data Science and Engineering,East China Normal University,Shanghai 200062;Computer School,China West Normal University,Nanchong,Sichuan 637009)
出处
《计算机学报》
EI
CSCD
北大核心
2018年第9期2120-2133,共14页
Chinese Journal of Computers
基金
国家自然科学基金(61702423
61370101
61532021
U1501252
U1401256
61402180)
四川省教育厅自然科学重点项目(17ZA0381)
西华师范大学国家培育项目(16C005)
西华师范大学英才基金(17YC158)资助~~
关键词
分布式轨迹流
聚类
滑动窗口
分布式时序轨迹聚类特征
概念漂移
distributed trajectory streams
clustering
sliding window model
distributed temporal trajectory cluster feature
concept drift