摘要
针对传统化学方法测定蓝莓贮藏品质存在工序复杂、成本高等问题,提出一种基于集成学习和近红外光谱技术的无损检测方法。以150个瑞卡蓝莓样本和30个绿宝石蓝莓样本为研究对象,利用近红外光谱仪采集不同贮藏时间的瑞卡蓝莓和不同成熟度的绿宝石蓝莓近红外反射光谱。利用光谱-理化值共生距离法(sample set partitioning based on joint X-Y distance,SPXY)将瑞卡蓝莓样本按照4∶1的数量比划分为训练集和验证集,绿宝石蓝莓样本为测试集,统一采用偏最小二乘法(partial least squares regression,PLSR)对比分析标准正态变换(standard normal variate transformation,SNV)、数据标准化(Z-score standardization,Z-score)、一阶导数(first derivative,1st-D)、二阶导数(second derivative,2nd-D)中一种或几种组合方法对原始光谱的预处理效果,使用竞争性自适应重加权采样法(competitive adaptive reweighted sampling,CARS)对蓝莓近红外光谱特征波长进行筛选,将支持向量回归(support vector regression,SVR)、极端梯度上升(extreme gradient boosting,XGBoost)和多层感知机(multilayer perceptron,MLP)作为基模型,采用Stacking集成策略,建立Stacking集成学习模型。将与蓝莓贮藏品质最为相关的维生素C、可溶性固形物(soluble solids content,SSC)和花青素作为标签,分别训练4种预测模型,其中Stacking集成模型最优,维生素C、SSC和花青素测试集相关系数R^(2)分别为0.8726、0.8814和0.9055,均方根误差(root mean square error,RMSE)分别为0.5664、0.6963和1.6939,相对分析误差(relative percent deviation,RPD)分别为2.8016、2.9037和3.253。结果表明,该文提出的Stacking集成学习模型融合SVR、XGBoost和MLP的优势,具有精度高,稳定性好,泛化能力强的特点,可为蓝莓无损检测研究提供新的思路。
A non-destructive detection method based on ensemble learning and near-infrared spectroscopy technology was proposed to address the complex process and high-cost issues of traditional chemical methods for determining blueberry storage quality.Using 150 Rika blueberries and 30 Green Emerald blueberries from Dandong as the research objects,near-infrared reflection spectra of Rika blueberries with different storage times and Green Emerald blueberries with different maturity levels were collected using a near-infrared spectrometer.The sample set partitioning based on the joint X-Y distance(SPXY)method was used to divide Rika blueberries samples into training and validation sets at a ratio of 4∶1,and Green Emerald blueberries samples were used as the test set.The preprocessing effects of one or several combinations of standard normal variate transformation(SNV),Z-score standardization,first derivative(1st-D),and second derivative(2nd-D)on the original spectra were compared using partial least squares regression(PLSR).The competitive adaptive reweighted sampling(CARS)method was used to select the characteristic wavelengths of blueberry near-infrared spectra,and support vector regression(SVR),extreme gradient boosting(XGBoost),and multilayer perceptron(MLP)were used as base models.A stacking ensemble learning model was established using the stacking integration strategy.Vitamin C,soluble solids content(SSC),and anthocyanins,which were most related to blueberry storage quality,were used as labels to train four prediction models.The stacking ensemble model was the best,with test set correlation coefficients(R^(2))of 0.8726,0.8814,and 0.9055 for vitamin C,SSC,and anthocyanins,respectively.The root mean square error(RMSE)was 0.5664,0.6963,and 1.6939,and the relative percent deviation(RPD)was 2.8016,2.9037,and 3.253.Results showed that the stacking ensemble learning model proposed in this study had the advantages of high accuracy,good stability,and strong generalization ability by integrating SVR,XGBoost,and MLP,providing new ideas for the non-destructive detection of blueberries.
作者
张晨
朱玉杰
冯国红
ZHANG Chen;ZHU Yujie;FENG Guohong(College of Engineering and Technology,Northeast Forestry University,Harbin 150040,China)
出处
《食品与发酵工业》
CAS
CSCD
北大核心
2023年第18期306-314,共9页
Food and Fermentation Industries
基金
黑龙江省自然科学基金项目(LH2020C050)。
关键词
近红外光谱
集成学习
蓝莓
无损检测
支持向量回归
极端梯度上升
多层感知机
near infrared spectrum
ensemble learning
blueberries
non-destructive testing
support vector regression
extreme gradient boosting
multilayer perceptron