摘要
在传统统计分析工具无法直接对原始数据进行建模分析的前提下,利用大数据工具对原始数据进行提取、转换和加载(ETL),再通过统计分析工具对样本数据进行可视化预测分析。本文通过Hadoop分布式集群对广东省部分公交线路岭南通用户2014年5个月的800多万条刷卡数据进行数据清洗,通过样本数据分析了公交客流量以周为周期的波动规律,并结合休息日(节假日、周末)效应,选用多元季节性时间序列模型对日时段客流量进行时间序列建模,最后通过外推预测评估模型,平均误差不超过5%,结论表明该模型适用于公交时段客流量的短时预测。
On the condition that traditional statistical analysis tools can not directly model and analyze massive datasets,it’s an effective way to use big data techs to extract,transform and load raw data which would be turned into sample data,and then a visual analysis by traditional tools has been made.Hadoop distributed clusters is used to clean the data of records that more than 8 million LingnanTong users in Guangdong take bus in 5 months in 2014,and SARIMAX model is chosen to model the time series which takes regularity of passengers’weekly fluctuation and rest days’effect into account.At last a prediction and evaluation of formal five days model are made and the average error is less than 5 percent,which indicates that this model applies to short-term forecasting of passenger flow.
作者
梁均
LIANG Jun(uhan University,Wuhan 430072,China)
出处
《长江工程职业技术学院学报》
CAS
2018年第1期4-7,共4页
Journal of Changjiang Institute of Technology