摘要
针对传统化学方法测定玉米种子含水率存在工序复杂、周期长、成本高等问题,提出一种基于集成学习算法和近红外光谱技术的快速、无损预测玉米种子含水率的方法。以‘陕科9号’等8个品种的320份玉米种子作为研究对象,用近红外光谱仪(AntarisⅡ型,美国Nicolet公司)采集玉米种子的近红外漫反射光谱。统一采用偏最小二乘回归(Partial Least Squares Regression,PLS)方法对比分析SG平滑滤波(Savitzky-Golay,SG)结合4种光谱预处理方法对玉米种子近红外光谱的预处理效果,发现Savitzky-Golay方法结合多元散射校正法去噪效果最优。采用竞争性自适应重加权算法(CARS)进行特征波长的提取,前7个光谱特征的累计贡献率超92%以上。以GBDT(Gradient Boosting Decision Tree,梯度提升决策树)、RF(Random Forest,随机森林)、XGB(或XGBoost,Extreme Gradient Boosting极端梯度上升)作为基础模型,采用Stacking作为融合策略,建立Stacking集成学习模型。预处理后的数据,提取前7个主成分作为特征向量,用直接干燥法得到这些种子的含水率作为标签,分别训练4种玉米种子含水率预测模型,对比分析该4种模型的性能指标,Stacking集成模型经过2163次训练后预测相关系数RP=0.9391,相对分析误差PRD=2.91。结果表明,Stacking集成模型融合了GDBT、RF、XGB 3个基础模型的优势,精度高,收敛特性好,泛化能力强,为玉米种子含水率快速、无损的测定提供了新的思路。
Traditional chemical methods for measuring moisture content in maize seeds are complex in process,long in time and high in cost etc.,a fast and nondestructive method for predicting moisture contents in maize seeds was proposed in this paper based on ensemble learning algorithm and near infrared spectroscopy.In this study,320 maize seeds of 8 varieties such as‘Shannke 9’were collected by near-infrared spectroscopy(AntarisⅡ,Nicolet,USA).Partial Least Squares Regression(PLS)method was used to compare and analyze the pretreatment effects of Savitzky-Golay(SG)and combination of four spectral pretreatment methods on NIR spectra of maize seeds.It is found that combination of Savitzky-Golay method with multiple scattering correction method has the best denoising effect.Competitive adaptive reweighting algorithm(CARS)was used to extract feature wavelengths,and the cumulative contribution rate of the first seven spectral features was over 92%.GBDT(Gradient Boosting Decision Tree),RF(Random Forest),XGB(or XGBoost,Extreme Gradient Boosting)was used as the basic model,and Stacking was used as a fusion strategy to build a stacking ensemble learning model.After pretreatment,the first 7 principal components were extracted as feature vectors,and the moisture content of these seeds was obtained by direct drying method as labels.Four prediction models for moisture content in maize seeds were trained respectively,and the performance indicators of the four models were compared and analyzed.After 2163 times training,the prediction correlation coefficient RP of the stacking ensemble model was 0.9391,and the relative analysis error PRD was 2.91.The results showed that the stacking ensemble model ensemble the advantages of GDBT,RF and XGB models,with high precision,good convergence characteristics and strong generalization ability,thus providing a new idea for fast and nondestructive determination of moisture contents in maize seeds.
作者
杨琳
张林
叶泽辉
YANG Lin;ZHANG Lin;YE Zehui(College of Electronic Information and Electrical Engineering,Shangluo University,Shangluo Shaanxi 726000,China;Shaanxi Shangdan Gaoxin School,Shangluo Shaanxi 726000,China)
出处
《西北农业学报》
CAS
CSCD
北大核心
2022年第8期1025-1034,共10页
Acta Agriculturae Boreali-occidentalis Sinica
基金
商洛学院科学研究项目(18SKY-FWDF001)。