摘要
为了快速检测玉米品种类型,基于支持向量机(SVM)和近红外光谱联合建立玉米品种的分类模型。以郑单958、先玉335、京科968、登海605和德美亚等五个品种共计293个样本为研究对象,对采集的近红外光谱进行标准正态变量变换(SNV)处理后使用主成分分析法(PCA)对光谱数据进行降维处理。按照6∶1比例,随机选取251个样本为训练集,42个样本作为测试集,探讨贝叶斯优化算法(BO)对SVM模型性能的影响。分别使用网格搜索(GS)、遗传算法(GA)和BO算法等三种方法对SVM模型的两个重要参数惩罚因子C和径向基核函数参数γ进行寻优。选择各模型十折交叉验证识别准确率最高时对应的惩罚因子和核参数作为建模参数,建立SVM分类模型。将使用BO算法建立的SVM分类模型与使用GS和GA进行参数寻优后建立的模型性能进行比对。实验发现,使用BO优化的SVM分类模型相比于其他两种优化算法得到的SVM模型性能具有显著优势,测试集的识别准确率可达到100%。说明使用BO算法寻优的SVM模型参数是全局最优参数,其他两种优化算法寻优的参数可能陷入了局部最优,从而导致模型性能表现不佳。在进行PCA降维前后的光谱数据上分别建立BO-SVM模型,结果表明,BO算法对于高维数据优化效果不佳,更适用于低维数据。对于不同样本类别间数量不均衡导致模型性能表现不佳的问题,通过剔除郑丹958和先玉335两类数量较少的样本,使用剩余三个类别,共计248个样本重新建立SVM模型,实验发现,剔除两类小样本之后,各个模型在测试集上的性能均有提升,说明对于类间样本数量不均衡问题,某类样本数量越多,对于模型参数的修正就越细腻,模型对该类的拟合效果就越好。研究结果可用于玉米品种的快速鉴别,也可为基于近红外光谱的其他农产品分类和产地鉴别提供参考。
In order to detect corn varieties quickly,a classification model of corn varieties was established based on the combination of support vector machine(SVM)and near-infrared spectroscopy.293 samples from five varieties,including Zhengdan 958,Xianyu 335,Jingke 968,Denghai 605 and Demeiya,were collected as research objects.After performing standard normal variable transformation(SNV)processing on the collected near-infrared spectra,the principal component analysis(PCA)method is used to reduce the dimensionality of the spectral data.According to the ratio of 6∶1,251 samples were randomly selected as the training set and 42 samples as the test set to explore the influence of the Bayesian optimization(BO)algorithm on the performance of the SVM model.Three methods,including grid search(GS),genetic algorithm(GA)and BO algorithm,were used to optimize the two important parameters of the SVM model,namely,the penalty factor C and the radial basis kernel function parameterγ.The C andγ,corresponding to the highest recognition accuracy based on ten-fold cross-validation of each model,were used as modeling parameters,and the SVM classification model based on the three optimization algorithm methods were established.The SVM classification model based on BO is compared with the model based on GS and GA.The experimental results show that the performance of the SVM classification model optimized by BO is superior to that of the other two optimization algorithms,and the recognition accuracy on the test set can reach 100%.This shows that the parameters of the SVM model optimized by BO are the optimal global parameters,and the parameters optimized by the other two optimization algorithms may fall into the local optimal,resulting in poor performance of the model.BO-SVM models were established on the spectral data before and after PCA dimensionality reduction.The results show that BO is not good for high-dimensional data optimization,and it is more suitable for low dimensional data.For the problem of poor performance of the model caused by the imbalance of the number of different sample categories,the SVM models were re-established by removing the two small samples,Zheng Dan 958 and Xianyu 335,and using the remaining three categories,a total of 248 corn samples.The experimental results show that the performance of each model on the test set is improved after removing the two types of small samples,which indicates that for the problem of unbalanced sample number between classes,the more samples of a certain class,the more delicate the correction of model parameters,and the better the fitting effect of the model on this class.The results of this study can be used for rapid identification of corn varieties and can also provide references for the classification and origin identification of other agricultural products based on near-infrared spectroscopy.
作者
冯瑞杰
陈争光
衣淑娟
FENG Rui-jie;CHEN Zheng-guang;YI Shu-juan(College of Information and Electrical Engineering,Heilongjiang Bayi Agricultural University,Daqing 163319,China;Technology Innovation Center for Heilongjiang Modern Agricultural Internet of Things,Daqing 163319,China;Heilongjiang Engineering Technology Research Center for Rice Ecological Seedings Device and Whole Process Mechanization,Daqing 163319,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2022年第6期1698-1703,共6页
Spectroscopy and Spectral Analysis
基金
国家重点研发计划项目(2016YFD0701300)
黑龙江省省属高校基本科研业务费科研项目(ZRCPY201913)资助。
关键词
近红外光谱
玉米
贝叶斯优化
主成分分析
支持向量机
Near infrared spectroscopy
Corn
Bayesian optimization
Principal component analysis
Support vector machine