结合虚拟样本生成的油菜花期集成学习预测模型

Ensemble learning prediction model for rapeseed flowering periods incorporating virtual sample generation

下载PDF

导出

摘要针对统计和线性回归模型难以完全揭示花期影响因子与花期之间的复杂非线性关系及油菜花期样本稀少的问题,提出了一种结合虚拟样本生成的集成学习算法来实现油菜花期的预测。该研究利用浙江省衢州市龙游县1999—2023年油菜盛花期与1998—2023年气象数据,通过基于高斯混合模型的虚拟样本生成(GMM-based virtual sample generation,GMM-VSG)算法与三次样条插值法(cubic spline interpolation)分别对原始样本进行扩充,采用8种机器学习算法建模并基于贝叶斯优化器进行超参数优化,最后通过Stacking集成学习方法,对8种算法进行不同的组合,建立了油菜花期预测模型。研究结果表明:相较于原始数据集,通过三次样条插值法与高斯混合模型生成的两个扩展数据集在各种机器学习算法中的性能显著提升,其中通过三次样条插值法生成的数据集表现最为优异。通过Stacking思想能提升模型的精度,其中以核岭回归(kernel ridge regression,KRR)、支持向量回归(support vector regression,SVR)、极端梯度提升树(extreme gradient boosting,XGBoost)这3种算法作为基模型,线性回归作为元模型的SRX_L模型表现最优,其平均绝对误差、均方根误差和决定系数,分别为0.1056 d、0.1227 d和0.9997。该研究结果可为油菜花期的准确预测提供有效方法。 Linear regression cannot fully reveal the complex non-linear relationships among influencing factors and scarce samples in the flowering period.In this study,ensemble learning was proposed to predict the flowering periods of rapeseed.The generation of virtual samples was also incorporated.The rapeseed in full bloom and meteorological data was utilized in Longyou County,Quzhou City,Zhejiang Province,China from 1998 to 2023.The original samples were expanded using Gaussian Mixture Model-based Virtual Sample Generation and Cubic Spline Interpolation.Two new datasets were obtained,each of which contained 985 samples.The models were established using eight machine learning methods:Random Forest(RF),Kernel Ridge Regression(KRR),Ridge Regression(RR),Least Absolute Shrinkage and Selection Operator(Lasso),Support Vector Regression(SVR),Extreme Gradient Boosting(XGBoost),Light Gradient Boosting Machine(LightGBM),and Gradient Boosting Decision Tree(GBDT).Hyperparameter optimization was conducted using a Bayesian optimizer.Finally,a prediction model was established for the rapeseed flowering period using stacking ensemble learning.The vast majority of models demonstrated superior performance on the Cubic interpolation dataset,compared with the original and GMM-VSG dataset.Specifically,the RF model was achieved in an RMSE of 0.679 d,an MAE of 0.351 d,and an R2 of 0.990,indicating significant improvements,compared with the original dataset with an RMSE of 6.286 d,an MAE of 5.028 d,and an R2 of 0.201,as well as the GMM-VSG dataset with an RMSE of 2.680 d,an MAE of 1.588 d,and an R2 of 0.881.Additionally,the SVR model also performed better on the Cubic dataset,with an RMSE of 0.849 d,an MAE of 0.333 d,and an R2 of 0.984,indicating a better performance than before.LightGBM as an ensemble learning was performed the best on the Cubic dataset,with the lowest RMSE of 0.613 d MAE of 0.336 d,and the highest R2 of 0.992.The strong feature learning and noise resistance were verified to capture the complex relationships within the dataset.In contrast,there was no significant improvement of Lasso and RR models on the Cubic dataset.For instance,Lasso exhibited an RMSE of 3.879 d and an MAE of 3.054 d on the Cubic dataset.There was a relative decrease in the error,compared with the original RMSE of 6.329 d and MAE of 5.567 d.There was a substantial gap relative to other models.Five models were developed using the Stacking ensemble learning approach:SRX_L,All_L,SLL_L,SRL_L,and SRK_L.Among them,the SRX_L model performed the best across various metrics.The highest R2 value of 0.9997 was achieved with the lowest RMSE and MAE values among all models,at 0.1227 d and 0.1056 d,respectively.There was a general consistency in the actual and predicted flowering trends,in terms of the fitting flowering period.The high predictive accuracy was also obtained over most years,particularly in 2001,2011,and 2014.Among them,the prediction closely matched the actual data with minimal discrepancies,sometimes less than 0.01 or even approaching zero.However,there were some years with the larger differences,such as 1999 and 2023.Particularly,the year 1999 experienced the largest discrepancy,where the error was 0.4421 d.The maximum actual flowering period occurred in 2005,reaching 92 days,with an error between the predicted and actual values of 0.0416 d.The minimum actual flowering period was observed in 2020,at 63 days,with an error between the predicted and actual values of 0.1325 d.Therefore,the model can be expected to highly accurately predict the extreme values.The virtual sample generation can also be suitable for small datasets.The predictive accuracy and generalizability of the improved model were significantly enhanced to reduce the costs and challenges of data collection.Compared with single machine learning,Stacking ensemble learning can substantially improve the predictive performance.Stacking ensemble learning is well-suited to complex tasks with nonlinear relationships,such as the flowering periods of rapeseed.

作者谢乾伟薛丰昌陈剑飞 XIE Qianwei;XUE Fengchang;CHEN Jianfei(Meteorological Disaster Geographic Information Engineering Laboratory,Nanjing University of Information Science&Technology,Nanjing 210044,China;Guangxi Zhuang Autonomous Region Lightning Protection Center,Nanning 530000,China)

机构地区南京信息工程大学气象灾害地理信息工程实验室广西壮族自治区防雷中心

出处《农业工程学报》 EI CAS CSCD 北大核心 2024年第19期159-167,共9页 Transactions of the Chinese Society of Agricultural Engineering

基金广西重点研发计划项目(桂科AB22080101) 南昌市农业气象重点实验室开放基金项目(2019NNZS102)。

关键词集成学习虚拟样本生成花期预测油菜 STACKING ensemble learning virtual sample generation flowering period prediction rapeseed Stacking

分类号 S565.4 [农业科学—作物学]

引文网络
相关文献

1张敏,马玉华,赵凯,彭志军,周俊良,吴亚维.蜂糖李需冷量与需热量研究[J].云南农业大学学报（自然科学版）,2023,38(6):1015-1024. 被引量：1
2叶菡,董永生,吴志会.以新农人培训“一招鲜”工程助力乡村共富的龙游实践[J].浙江经济,2024(7):62-63.
3万逸轩,黄建华,孙希延,罗明明.基于改进DeeplabV3+的滑坡识别方法优化[J].计算机仿真,2024,41(9):182-188.
4李亚州,王昊天,任建强,欧旭鹏.数字孪生技术在智慧风电场中的应用[J].电子技术（上海）,2024,53(8):124-126.
5武思言,李佳恩,金贤中,杜益民,陶锐.基于集成学习算法的企业财务风险预测模型研究[J].南方企业家,2024(9):0123-0124.
6宋阳.基于梯度提升决策树的房价预测模型[J].现代计算机,2024,30(17):81-84.
7张斌.元《故总管张公墓志铭》长卷考释[J].形象史学,2023(4):332-343.
8王纯雪,王轶蓉,王丽娜.中药干预治疗女性压力性尿失禁的Meta分析[J].中医临床研究,2024,16(21):79-88.
9王家腾.杂交稻新品种诚优13在宁化县的种植表现、栽培及制种技术[J].中国农技推广,2024,40(6):34-36.
10杜明泽,吴萌轶(指导).爸爸的力量[J].小学生时代,2024(11):35-35.

农业工程学报

2024年第19期

浏览历史

内容加载中请稍等...

结合虚拟样本生成的油菜花期集成学习预测模型

相关作者

相关机构

相关主题

浏览历史