In order to deal with the unclear absorption peak caused by the absorption peak overlap of traditional Chinese medicine(TCM)and other mixtures,a method of three unsupervised clustering algorithms as K-means,K-medoids ...In order to deal with the unclear absorption peak caused by the absorption peak overlap of traditional Chinese medicine(TCM)and other mixtures,a method of three unsupervised clustering algorithms as K-means,K-medoids and Fuzzy C-means(FCM)combined with the first derivative characteristics of terahertz absorption spectrum,is proposed to perform the terahertz spectra clustering of Sanchi and other three kinds of TCM compared with their easily-confused products(ECPs).These three unsupervised clustering methods complement the scope of the supervised learning classification method.The first derivative of the spectrum could amplify the difference in the absorption coefficient with different substances,so that the obvious absorption peak can be revealed.Experiments shows that these three clustering algorithms can achieve good results by combining the origin absorption coefficient with its first-order derivative as the characteristic data,and among which K-means does the best with the accuracy of95.32%.Compared with pure absorption coefficient data clustering,the accuracy in this study has been significantly improved,especially for the non-absorption-peak TCM classification.And the accuracy of K-means algorithm is improved by5.38%.Besides,clustering algorithms in this study have strong anti-interference ability to the error data.展开更多
基金National Natural Science Foundation of China(No.61675151)
文摘In order to deal with the unclear absorption peak caused by the absorption peak overlap of traditional Chinese medicine(TCM)and other mixtures,a method of three unsupervised clustering algorithms as K-means,K-medoids and Fuzzy C-means(FCM)combined with the first derivative characteristics of terahertz absorption spectrum,is proposed to perform the terahertz spectra clustering of Sanchi and other three kinds of TCM compared with their easily-confused products(ECPs).These three unsupervised clustering methods complement the scope of the supervised learning classification method.The first derivative of the spectrum could amplify the difference in the absorption coefficient with different substances,so that the obvious absorption peak can be revealed.Experiments shows that these three clustering algorithms can achieve good results by combining the origin absorption coefficient with its first-order derivative as the characteristic data,and among which K-means does the best with the accuracy of95.32%.Compared with pure absorption coefficient data clustering,the accuracy in this study has been significantly improved,especially for the non-absorption-peak TCM classification.And the accuracy of K-means algorithm is improved by5.38%.Besides,clustering algorithms in this study have strong anti-interference ability to the error data.