摘要
基于共享单车项目的多维度大样本数据,以套索回归、岭回归、随机森林和迭代决策树等机器学习模型,探讨了共享单车短期(基于小时)需求预测的主要影响因素,并对不同模型预测效果进行了比较分析。研究结果发现,影响共享单车小时需求的主要因素包括特定的位置因素、时间因素以及天气条件因素。同时,相比普通线性回归、套索回归和岭回归模型,随机森林和迭代决策树模型对共享单车短期即时需求预测的结果更精确,在样本内部拟合和样本外推预测中的拟合优度(R2)更高,标准误差(RMSE)更低,是共享单车行业短期实时需求精准预测的更有效手段。
Using the large and multidimensional data released by the bike-sharing project,and employing the Machine Learning Models,this article discussed the factors influenced short-term demand prediction of a bike-haring business.The results showed that the major factors that affected the short-term demand of bike sharing include specific location,time,and weather conditions.Meanwhile,compared with the Ordinary Linear Regression,Lasso Regression and Ridge Regression model,the Random Forest and Gradient Boosting Decision Tree models had higher goodness of fit(R 2)and lower standard error(RMSE)in both in sample and the out sample predictions,which shed lights on the machine learning models and are more suitable for short-term precise demand predictions.
作者
焦志伦
金红
刘秉镰
张子豪
JIAO Zhi-lun;JIN Hong;LIU Bing-lian;ZHANG Zi-hao(College of Economic and Social Development,Nankai University,Tianjin 300071,China;Analytic Partners,Inc.New York,NY 10017,USA)
出处
《商业经济与管理》
CSSCI
北大核心
2018年第8期16-25,35,共11页
Journal of Business Economics
基金
国家自然科学基金项目"考虑消费者行为的O2O服务企业决策优化与供应链协同研究"(71772095)
"中国特色社会主义经济建设协同创新中心"项目支持
南开大学人文社会科学青年教师研究启动项目"互联网革命与物流业态变革研究"
关键词
共享单车
大数据
需求预测
机器学习
bike sharing
big data
demand prediction
machine learning