摘要
为获取移动用户轨迹数据中的价值信息,利用Spark集群分布式处理用户行为轨迹的原始数据,对存入HBase分布式数据存储中的数据进行筛选、分析后存入Hive数据库中,Spark对数据库文件中的数据进行相关性卡方检验和K-Means聚类分析,得到用户出行方式的统计结果及在距离限制下出行方式选择的普遍规律.该过程为用户行为分析和预测提供可行的解决方案.
To obtain valuable information from mass mobile user trajectory data,Spark cluster was used to process the user behavior original data of trajectory,the data storaged in the HBase distribution was storaged in the Hive after screening and analysis,the Spark reads the data in the database file card to do the correlation test and K-Means clustering analysis,getting user mode in the distance and the statistical result of the universal law of the travel mode choice.This process provides a feasible solution for user behavior analysis and prediction,which has high research value and practical value.
作者
张嘉诚
张晓滨
ZHANG Jiacheng;ZHANG Xiaobin(School of Computcr Scicncc,Xi'an Polytcchnic Univcrsity,Xi'an 710018,China)
出处
《西安工程大学学报》
CAS
2018年第3期343-347,共5页
Journal of Xi’an Polytechnic University
基金
陕西省自然科学基金(2015JQ5157)