摘要
目的:基于中药拉曼光谱的检测分析,测得拉曼光谱数据进行特征选择并建立辛味辨识模型。方法:132种辛味中药及156种非辛味中药经样品前处理后,利用SEED 3000拉曼光谱仪分析,得到每味中药的拉曼谱图,并以1 cm^(-1)为单位量化;对量化后的拉曼数据进行基于随机森林(RF)和极端梯度提升(XGBoost)算法的特征选择,筛选出与辛味密切相关的特征拉曼位移及其峰强,然后基于RF、K近邻(KNN)、梯度提升算法(GBM)、朴素贝叶斯(GNB)、自适应提升算法(AdaBoost)5种分类算法建立辨识模型并对模型进行评价。结果:相较于非辛味中药,辛味中药在2 500~3 000 cm^(-1)范围内呈现出高强度的拉曼散射;基于RF算法和XGBoost算法筛选重要性排序在前100的拉曼位移数据为特征,在所有模型中GBM算法所建立的分类模型表现出最佳效能,曲线下面积(AUC)为0.978,准确率为0.943,精确度为0.970。结论:中药拉曼谱图与辛味药性具有显著的相关性,可作为辛味药性整体量化表征,结合GBM算法可以进行高效、准确地对药性进行辨识分析。
Objective:To perform feature selection with detected Raman spectroscopic data and establish a pungent flavor discrimination model based on the detection and analysis of Raman spectroscopy.Methods:After sample preprocessing,132 pungent Chinese medicines and 156 non-pungent Chinese medicines were analyzed using a SEED 3000 Raman spectrometer.Raman spectra were obtained for each Chinese medicine and quantified at 1 cm^(-1) intervals.Feature selection was performed on the quantified Raman data using random forest(RF) and extreme gradient boosting(XGBoost) algorithms to identify Raman shifts and peak intensities closely related to pungent flavor.Discrimination models were then built using five algorithms,i.e.,RF,K-nearest neighbors(KNN),gradient boosting machine(GBM),naive Bayes(GNB),and adaptive boosting(AdaBoost),and the models were evaluated.Results:Pungent Chinese medicines exhibited high-intensity Raman scattering in the 2 500 to 3 000 cm^(-1) range compared to non-pungent ones.Feature selection based on the top 100 Raman shift data from RF and XGBoost algorithms identified the most important features.Among all models,the GBM algorithm demonstrated the best performance with an area under the curve(AUC) of 0.978,an accuracy of 0.943,and a precision of 0.970.Conclusion:Raman spectroscopy is significantly related to the pungent flavor of Chinese medicines and can be used to quantitatively characterize pungent flavor.Combining Raman spectroscopy with the GBM algorithm allows for efficient and accurate identification and analysis of pungent Chinese medicines.
作者
李文妍
梁浩
程虹
赵紫薇
王慧
王耘
LI Wenyan;LIANG Hao;CHENG Hong;ZHAO Ziwei;WANG Hui;WANG Yun(Research Center of TCM-Information Engineering,School of Chinese Materia Medica,Beijing University of Chinese Medicine,Beijing 102488,China;School of Life Sciences,Beijing University of Chinese Medicine,Beijing 102488,China)
出处
《世界中医药》
CAS
北大核心
2024年第13期1939-1945,共7页
World Chinese Medicine
基金
国家自然科学基金项目(81973495)。
关键词
中药药性
拉曼光谱
辛味
特征筛选
随机森林
辨识模型
Chinese medicine property
Raman spectroscopy
Pungent flavor
Feature selection
Random forest
Discrimination model