摘要
用于销售预测的历史数据存在稀疏性与波动性等特点,当预测周期较长时,传统统计学或者机器学习领域预测算法的预测效果较差。为此,利用随机森林的集成思想与训练数据集的随机分割重组,提出一种基于数据集成的随机森林算法。该算法通过随机重组将原始的一维预测变量重组为高维变量,并将输出求和值作为最终预测值。实验结果表明,与ARIMA、RF、GBDT等传统算法相比,该算法在实际数据集上的预测效果取得显著提高。同时,拓展实验表明数据集成还可应用在ARIMA算法上,使预测准确率提高约3%。
The historical data used for sales forecasting has the characteristics of sparseness and volatility,the traditional statistical or machine learning prediction algorithms for prediction perform poorly when the prediction cycle is long.Therefore,based on the integration idea of Random Forest(RF)and the random partition and reorganization of training data set,this paper proposes a RF algorithm based on data integration.The algorithm reconstructs the original one-dimensional prediction variable into high-dimensional variables by random recombination,and takes the output summation value as the final prediction value.The experimental results show that compared with traditional algorithms including ARIMA,RF and GBDT,the prediction performance of this algorithm on the actual data set has been significantly improved.At the same time,extended experiments show that the data integration can also be applied to ARIMA algorithm,and the prediction accuracy of the algorithm is improved by about 3%.
作者
谢坤
容钰添
胡奉平
陈桓
姚小龙
XIE Kun;RONG Yutian;HU Fengping;CHEN Huan;YAO Xiaolong(Research and Development Center of Big Data and Blockchain,SF Technology Co.,Ltd.,Shenzhen,Guangdong 518000,China)
出处
《计算机工程》
CAS
CSCD
北大核心
2020年第12期290-298,共9页
Computer Engineering
基金
深圳市发展改革委战略性新兴产业发展专项“基于人工智能技术的智慧物流系统研发与产业化项目”。
关键词
销量预测
时间序列预测
机器学习
数据集成
随机森林
sales forecasting
time series prediction
machine learning
data integration
Random Forest(RF)