摘要
目的丰富非形态学小样本特征量水产品种类的自动识别方法。方法在质谱光谱检测技术下构建水产品中铅、镉、汞、无机砷金属元素含量的数据库。通过基于基尼指数的决策树算法离散化扩充特征、基于信息熵随机森林算法进行主要特征选择数据的预处理,在多个机器学习算法下构建鱼、虾、蟹、贝水产品多分类预测模型和贝类二分类种类溯源模型,通过随机搜索算法完成模型调优。结果随机森林、极端随机树、决策树算法在贝类二分类中预测准确率达91.67%,曲线下面积(area under the curve,AUC)为91.67%;极端梯度提升树算法在水产品多分类预测的测试集准确率达82.14%,AUC为93.43%,数据离散化后特征量实现6倍扩充,贝类二分类准确率在随机森林模型中最高提升12.53%。结论本文基于光谱质谱技术和树模型集成学习算法融合多种预处理方法的预测模型准确率和稳定性较优,可有效为水产品种类的自动化识别问题提供可参考的研究方法。
Objective To enrich the automatic identification method of aquatic product species with non-morphological features of small sample sizes.Methods The database of metal element contents of lead,cadmium,mercury and inorganic arsenic in aquatic products was constructed by mass spectrometry spectroscopy detection technology.Through the Gini index-based decision tree algorithm to discretize the expanded features and the information entropy-based random forest algorithm for the main feature selection data preprocessing,a multi-classification prediction model for four kinds of aquatic products including fish,shrimp,crab and shellfish,and the traceability model of binary classification of shellfish were constructed under multiple machine learning algorithms.Model tuning was optimized by a stochastic search algorithm.Results The prediction accuracy and area under the curve(AUC)of the proposed algorithm which concludes random forest,extreme randomized trees,and decision tree for the binary classification of shellfish were 91.67%and 91.67%,respectively.The accuracy of the test set of extreme gradient boosting algorithm in aquatic product multi-classification prediction was 82.14%,and the AUC value was 93.43%.After the data were discretized,the features were expanded 6-fold,and the accuracy of binary classification of shellfish was increased by 12.53%compared with the random forest model.Conclusion The prediction model based on the fusion of spectral mass spectrometry and tree model-based integrated learning algorithm has satisfied accuracy and stability,which can provide an effective research methodology for the automatic identification of aquatic product species.
作者
梁怀新
王浩然
刘斌
赵慧琴
杨辉
郑存芳
刘宁
LIANG Huaixin;WANG Haoran;LIU Bin;ZHAO Huiqin;YANG Hui;ZHENG Cunfang;LIU Ning(Clinical Laboratory,Qinhuangdao Center for Disease Control and Prevention,Qinhuangdao,Hebei 066004,China;不详)
出处
《医学动物防制》
2024年第10期1020-1023,共4页
Journal of Medical Pest Control
基金
河北省卫健委重点科技研究计划(20231899)。
关键词
机器学习
水产品
建模
分类
小样本
Machine learning
Aquatic product
Modeling
Classification
Small sample