期刊文献+

海量公交数据的人群画像算法 被引量:2

Crowd profiling algorithm mass transit data
下载PDF
导出
摘要 面向海量公交数据的人群画像对分析城市群体出行特点、交通态势等极具价值,但对数据的处理存在耗时高、质量低、解释难等问题。提出一种海量公交数据人群画像的系统化解决策略,基于PageRank算法筛选出经过重要站点的人群轨迹,极大减少目标人群的轨迹数据;提出轨迹文本化分析方法来提高人群画像的可解释性;分析确定基于余弦距离的K-means算法作为人群画像分类的聚类算法。该算法在3000万乘客公交出行数据上的实验表明:提出的解决策略能够较为系统性地解决海量公交数据的人群画像问题,同时基于余弦距离的K-means算法的聚类效果最好且准确率约达80%。将人群画像及其轨迹使用Flow Map进行可视化展示,结果符合真实世界的人群行为特征。 Crowd profiling of massive transit data is valuable for analyzing the travel characteristics and traffic trends of urban groups,but the processing of the data is time-consuming,low-quality and difficult to interpret.A systematic solution for crowd profiling of massive public transport data was proposed.Based on the PageRank algorithm,the trajectories of people passing through important stations were filtered out,which greatly reduced the trajectory data of the target population.A textual analysis method for trajectories was proposed to improve the interpretability of crowd profiling.And the K-means algorithm based on cosine distance as the clustering algorithm for crowd profiling was analysed and determined.The experiments on 30 million passengers′transit data show that the proposed algorithm can solve the problem of crowd profiling in massive transit data in a more systematic way,while the K-means algorithm based on cosine distance has the best clustering effect and the accuracy rate is about 80%.The crowd profiling and its trajectory were visually displayed by using Flow Map,and the results are consistent with real-world crowd behavioural characteristics.
作者 张锦 张建忠 汪飞 郭芊 ZHANG Jin;ZHANG Jianzhong;WANG Fei;GUO Qian(College of Information Science and Engineering,Hunan Normal University,Changsha 410006,China;School of Computer and Communication Engineering,Changsha University of Science and Technology,Changsha 410114,China;School of Mathematics and Statistics,Hunan Normal University,Changsha 410006,China)
出处 《国防科技大学学报》 EI CAS CSCD 北大核心 2023年第2期55-64,共10页 Journal of National University of Defense Technology
基金 国家部委基金资助项目(31511010105) 湖南省自然科学基金资助项目(2021JJ30456)。
关键词 人群画像 PAGERANK算法 轨迹文本化 文本聚类 crowd portraits PageRank algorithm trajectory textualization text clustering
  • 相关文献

参考文献10

二级参考文献114

  • 1王芳,高晓路,许泽宁.基于街区尺度的城市商业区识别与分类及其空间分布格局——以北京为例[J].地理研究,2015,34(6):1125-1134. 被引量:68
  • 2张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 3柴彦威.行为地理学研究的方法论问题[J].地域研究与开发,2005,24(2):1-5. 被引量:78
  • 4孙吉贵,刘杰,赵连宇.聚类算法研究[J].软件学报,2008(1):48-61. 被引量:1065
  • 5Chen M S,Han Jiawei,Yu P S,et al.Data Mining:An Overview from a Database Perspective[J].IEEE Transactions on Knowledge and Data Engineering,1996,8(6):866-883.
  • 6Jiang Hua,Li Jing,Yi Shenghe,et al.A New Hybrid Method Based on Partitioning-based DBSCAN and Ant Clustering[J].Expert Systems with Applications,2011,38(8):9373-9381.
  • 7MacQueen J.Some Methods for Classification and Analysis of Multivariate Observations[C]//Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability.Berkeley,USA:University of California Press,1967:281-297.
  • 8Ester M,Kriegel H P,Sander J,et al.A Density-based Algorithm for Discovering Clusters in Large Spatial Databases with Noise[C]//Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining.Portland,USA:AAAI Press,1996:226-231.
  • 9Liu Peng,Zhou Dong,Wu Naijun.VDBSCAN:Varied Den-sity Based Spatial Clustering of Applications with Noise[C]//Proceedings of International Conference on Ser-vice Systems and Service Management.Washington D.C.,USA:IEEE Press,2007:1-4.
  • 10Karypis G,Han E H,Kumar V.Chameleon:Hierarchical Clustering Using Dynamic Modeling[J].Computer,1999,32(8):68-75.

共引文献292

同被引文献15

引证文献2

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部