摘要
为解决基于气相色谱-质谱联用(GC-MS)仪采集的浓香型白酒基酒等级分类中样本不均衡导致分类模型性能下降的问题,提出了一种面向不平衡数据集的浓香型白酒基酒分类研究。该方法首先采用合成少数类过采样技术(SMOTE)对浓香型基酒样品中少数类样本进行扩充,改善样本的不均衡性;然后结合稀疏主成分分析(SPCA)对GC-MS图谱数据进行降维;最后使用深度森林(DF)分类器建立浓香型白酒基酒分类识别模型。结果表明,使用SMOTE算法对基酒数据集进行平衡之后能够有效提高模型分类准确率,所建立的浓香型基酒分类模型正确率达到96.61%,该分类模型的建立对基酒等级分类能起到一定的指导和借鉴作用。
In order to solve the problem of unbalanced samples which causing a decrease in the performance of classification models of base liquor of strong-flavor(Nongxiangxing)Baijiu collected by gas chromatography-mass spectrometry(GC-MS),a classification study of strong-flavor Baijiu base liquor for unbalanced data sets was proposed.In the method,a few class samples of strong-flavor Baijiu base liquor were expanded by using the synthetic minority over sampling technique(SMOTE)to improve the unbalanced of samples.Then the dimensions of GC-MS data were reduced by using sparse principal component analysis(SPCA).Finally,the classification and recognition model of strong-flavor Baijiu base liquor was established by using deep forest(DF)classifier.The results showed that the model classification accuracy rate could be effectively improved after using SMOTE algorithm to balance the base liquor data set,the accuracy of the established classification model for strong-flavor Baijiu base liquor reached 96.61%,and the establishment of the classification model could play a certain guidance and reference role for grade classification of base liquor.
作者
王继华
李兆飞
杨壮
赵娜
张贵宇
WANG Jihua;LI Zhaofei;YANG Zhuang;ZHAO Na;ZHANG Guiyu(Artificial Intelligence Key Laboratory of Sichuan Province,Sichuan University of Science&Engineering,Yibin 644000,China;School of Automation and Information Engineering,Sichuan University of Science&Engineering,Yibin 644000,China)
出处
《中国酿造》
CAS
北大核心
2024年第1期184-189,共6页
China Brewing
基金
四川省自贡市科技局重点科技计划项目(2019YYJC15)
四川轻化工大学科研基金项目(2020RC32)
四川轻化工大学研究生创新基金项目(Y2022150)
四川轻化工大学研究生课程建设项目(AL202213)。
关键词
气相色谱-质谱联用
浓香型白酒基酒
合成少数类过采样技术
稀疏主成分分析
基酒分类
gas chromatography-mass spectrometry
strong-flavor Baijiu base liquor
synthetic minority over-sampling technique
sparse principal component analysis
base liquor classification