摘要
本文搜集了不同类型帆船的制造商、年份、长度、横梁尺寸、吃水深度、排水量、满载水线、船帆面积、平均货物吞吐量、出售区域,单双体的数据,同时搜集了帆船出售地的人均GDP、平均气温、平均降水量的数据。检查并剔除了数据集中的异常值,对于二手帆船挂牌价格做对数变换的处理,使其更符合正态的分布。基于随机森林、XGBoost、GBDT、LightGBM模型,模型优化阶段以网格搜索算法进行超参数优化,构建了四种可以对二手帆船对数价格进行估计的模型,其中随机森林的估计精度达到96.61%,其余三种模型的估计精度也可达到95%以上。在此基础上本文以Stacking模型融合对四种单体模型进行综合融合,最终得到拟合度更高的融合模型,其对数价格估计的均方误差接近于0.01,此融合模型具有较高的准确性。利用该模型,可以对二手帆船进行合理的定价。
This paper collects the data of manufacturers of different types, year, length, beam size, draft depth, displacement, full load waterline, sail area, average cargo throughput, sale area, and single pair of sailing land, and collects the data of per capita GDP, average temperature and average precipitation of sailing land. Check and eliminate the outliers in the data set, and log-transform the listing price of the second-hand sailboat to make it more in line with the normal distribution. Based on the random forest, XGBoost, GBDT and LightGBM models, the model optimization stage uses the grid search al-gorithm to build four models that can estimate the log price of second-hand sailboats. Among them, the estimation accuracy of random forest reaches 96.61%, and the estimation accuracy of the other three models can also reach more than 95%. On this basis, this paper comprehensively integrates the four monomer models with Stacking model fusion, and finally obtains the fusion model with higher fit. The mean square error of the log price estimation is close to 0.01. This fusion model has high accuracy. Using this model, second-hand sailboats can be reasonably priced.
出处
《应用数学进展》
2023年第9期4006-4012,共7页
Advances in Applied Mathematics