摘要
共享单车作为一项科技创新产品,解决了城市最后一公里的难题。针对共享单车供需关系不平衡且复杂多变的特性,以及解决传统机器学习工具工作复杂、流程不清晰等问题,采用Spark计算框架以及Spark机器学习管道将UCI实验室共享单车数据集应用到Spark平台,并结合线性回归、决策树、随机森林、梯度提升树这四种机器学习方法分别构建了不同的回归模型。得出随机森林的预测效果最好,RMSE、MAE和R^(2)分别为50.95、34.67和0.92。该模型具有较高的准确率,可为单车调度和预测提供较好的参考。
As a technological innovation,shared bicycles solve the problem of the last mile in the city.Aiming at the unbalanced and complex characteristics of the supply and demand of shared bicycles,as well as the complex work and unclear process of traditional machine learning tools,the Spark computing framework and Spark machine learning pipeline are used to build the Spark platform from the shared bicycle data set of the UCI laboratory.Combined with the four machine learning methods of linear regression,decision tree,random forest,and gradient boosting tree,different regression models were constructed.It is concluded that the prediction effect of random forest is the best,with RMSE,MAE and R^(2) of 50.95,34.67 and 0.92,respectively.The model has high accuracy and can provide a better reference for scheduling and forecasting of bicycles.
作者
殷丽凤
李钊
YIN Lifeng;LI Zhao(College of Software,Dalian Jiaotong University,Dalian 116028,China)
出处
《电子设计工程》
2023年第8期5-9,共5页
Electronic Design Engineering
基金
国家自然科学基金(61771087)。