摘要
本文提出一种基于K-means聚类与机器学习回归算法的预测模型以解决零售行业多个商品的销售预测问题,首先通过聚类分析识别出具有相似销售模式的商品从而实现数据集的划分,然后分别在每个子数据集上训练了支持向量回归、随机森林以及XGBoost模型,通过构建数据池的方式增加了用于训练模型的数据量以及预测变量的选择范围.在一家零售企业的真实销售数据集上对提出的模型进行了验证,实验结果表明基于K-means和支持向量回归的预测模型表现最优,且所提出的模型预测效果明显优于基准模型以及不使用聚类的机器学习模型.
In this study, we propose a forecasting model based on K-means clustering and a machine learning regression algorithm for the sales forecasting of multiple commodities in the retail industry. First, we utilize the clustering technique to identify commodities with similar sales patterns and then divide the whole dataset into different groups. Subsequently,three machine learning regression algorithms, i.e., support vector regression, random forest and XGBoost models, are trained on each sub-dataset. The data size for model training and the scope of forecasting variables are increased by the construction of a data pool. The proposed models are verified on a real sales dataset of a retail company. The experimental results show that the forecasting model based on K-means and support vector regression performs the best, and the forecasting performance of the proposed models is significantly better than that of the benchmark models and the machine learning models without using clustering.
作者
周雨
段永瑞
ZHOU Yu;DUAN Yong-Rui(School of Economics and Management,Tongji University,Shanghai 200092,China)
出处
《计算机系统应用》
2021年第11期188-194,共7页
Computer Systems & Applications
基金
国家自然科学基金(71771179,71532015)。
关键词
零售行业
销售预测
时间序列
机器学习
聚类
retail industry
sales forecasting
time series
machine learning
clustering