期刊文献+

基于密度聚类的签到轨迹大数据分层预处理研究 被引量:4

TAXONOMIC PREPROCESSING OF CHECK-IN TRAJECTORY BIG DATA BASED ON DENSITY CLUSTERING
下载PDF
导出
摘要 随着基于位置的社交网络的发展,时空-文本等轨迹数据量呈指数式增长,与此同时数据低质的问题日益显著。高质的签到数据可以使研究人员更好地从中挖掘丰富且有意义的知识,因此为了更有效地使用签到大数据,数据预处理必不可少。签到数据具有冗余度高、同时签到、时空签到跨度大等低质问题,导致不能直接使用现有的数据预处理流程和方法。针对签到数据特性,提出一套具有针对性的数据预处理流程。通过平均化处理消除了签到轨迹中存在的同时签到数据;通过学习基于熵的时间戳间隔阈值划分签到轨迹,解决签到轨迹时间跨度大的问题;利用基于密度聚类的方法实现签到轨迹分层,解决空间跨度大的问题。实验采用真实的签到轨迹数据,从离群点和分层效果两个方法对预处理效果进行评价,实现不同空间粒度的签到轨迹分离预处理,为后续的轨迹分析与挖掘奠定基础。 With the development of location-based social networks, the amount of trajectory data such as space-time and text has grown exponentially. Meanwhile, the problem of low quality data has become increasingly prominent. High-quality check-in data allows researchers to better extract rich and meaningful knowledge. Data preprocessing is essential to use check-in big data more effectively. The check-in data has low quality issues: high redundancy, simultaneous check-in, and large spatio-temporal check-in span. The result is that existing data preprocessing process and methods cannot be used directly. According to the characteristics of check-in data, we proposed a set of targeted data preprocessing process. We applied the averaging process to eliminate the presence of the simultaneous check-in data in the check-in trajectory. By learning the threshold of time stamp interval based on entropy to divide the check-in trajectory, we solved the problem of long time span of the check-in trajectory. Using density-based clustering method, the problem of long-span multi-level space of check-in trajectory was solved. The experiment used the real check-in trajectory data, and evaluated the preprocessing effect from the two methods of outliers and taxonomy effects. The results show that the preprocessing of check-in trajectory separation with different spatial granularity is realized, which lays a foundation for subsequent trajectory analysis and mining.
作者 文若晴 马昂 潘晓 杨伟伟 Wen Ruoqing;Ma Ang;Pan Xiao;Yang Weiwei(Shijiazhuang Tiedao University, Shijiazhuang 050043, Hebei, China;Key Research Base for Humanities and Social Sciences in Hebei Province(Shijiazhuang Tiedao University), Shijiazhuang 050043, Hebei, China)
出处 《计算机应用与软件》 北大核心 2019年第3期20-28,56,共10页 Computer Applications and Software
基金 国家自然科学基金项目(61303017) 河北省自然科学基金项目(F2018210109) 河北省教育厅青年(ZD2018040) 石家庄铁道大学第四届优秀青年科学基金项目(Z661250444) 石家庄铁道大学研究生创新资助项目(YC201718) 国家级大学生创新创业训练计划项目(201710107007)
关键词 签到轨迹 预处理 轨迹相似性 聚类 分层 Check-in trajectory Preprocessing Trajectory similarity Clustering Taxonomic
  • 相关文献

参考文献3

二级参考文献17

共引文献134

同被引文献28

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部