摘要
岩石光谱是岩石物理化学性质、成分和结构的综合体现,如今已经被广泛应用于岩石分类研究中。岩石光谱数据具有高维的特征数量,在样本数量有限的情况下训练时,往往会产生维数灾难现象。由于岩石光谱的数据收集困难,这在产生极大的人力成本的同时也导致收集到的岩石光谱数据往往十分有限。因此如何能够在样本数量较少时,对岩石光谱数据取得较为准确的分类效果成为了如今热门的研究课题。利用辽宁兴城地区的典型岩石光谱数据,基于Python编程语言在训练样本较少的情况下构建了孪生网络分类模型,并以Triplet Loss作为损失函数,实现了3-way-1-shot分类模型,在测试集上取得了97.8%的分类准确率。同时使用了决策树、随机森林、支持向量机和K-近邻四种传统机器学习方法在相同训练样本下建立分类模型与之对比,通过绘制学习曲线,验证了这四种传统机器学习方法在小样本的情况下不具备良好的分类功能。由于将原始光谱数据转化为图片数据之后并不会影响孪生网络模型的分类效果,因此可以将岩石光谱分类问题转化为图像分类的问题,进而使用图像分类的方法和手段。实验结果表明,孪生网络模型在岩石光谱样本数量较少的情况下仍然能够取得优秀的分类效果,有效弥补了传统机器学习模型在小样本情况下的不足之处,并且由于其数据的输入是成对的,可以有效减小因训练样本过少而导致的过拟合问题。
Rock spectrum is the comprehensive embodiment of rock's physical and chemical properties,composition and structure.Now,it has been widely used in rock classification research.Due to the difficulty of collecting the data on the rock spectrum,it often needs to be collected manually,which not only causes great labor cost but also leads to the limited data on the rock spectrum collected.When the rock spectral classification model is trained with a limited number of samples,the dimensional disaster phenomenon will generally occur.That is,the accuracy of classification will decrease with the rise of the feature dimension,and the rock spectral data coincides with this feature,with a high dimensional number of features.Therefore,to achieve good classification results,a large number of training samples are needed to be used in the training of traditional rock spectral classification models,usually more times than the feature dimension.If the number of samples is small,we must reduce the features to obtain the ideal classification accuracy.Therefore,when the number of samples is small,obtaining a more accurate classification effect on rock spectral data has become a hot research topic.This paper collects the spectral data of typical rocks in Xingcheng,Liaoning Province.Based on the Python programming language,the Siamese Network classification model is constructed with few training samples,and the Triplet Loss is used as the loss function to realize the 3-way-1-shot classification model,and the prediction accuracy of 97.8%is achieved in the verification set.At the same time,four traditional machine learning methods,which include Decision Tree,Random Forest,Support Vector Machine and K-Nearest Neighbor,were used to establish the classification model under the same training samples and compared with them.By drawing the learning curve,it is verified that these four traditional machine learning methods do not have good classification functions in the case of small samples.Since converting the original spectral data into image data will not affect the classification effect of the Siamese Network classification model,the rock spectral classification problem can be transformed into the problem of image classification.Then the image classification methods and means can be used.The experimental results show that the Siamese Network classification model in the case of fewer rock spectral samples can still achieve excellent classification effect,which effectively makes up for the shortcomings of the traditional machine learning model in the case of small samples.Because the data input is paired,it can effectively reduce the overfitting problem caused by too few training samples.
作者
肖志强
贺金鑫
陈德博
战晔
逯燕乐
XIAO Zhi-qiang;HE Jin-xin;CHEN De-bo;ZHAN Ye;LU Yan-le(College of Earth Sciences,Jilin University,Changchun 130061,China;Aviation University of Air Force,Changchun 130012,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2024年第2期558-562,共5页
Spectroscopy and Spectral Analysis
基金
国家重点研发计划项目(2020YFA0714103)
第八届吉林大学青年师生交叉学科培育项目(2022-JCXK-31)资助。
关键词
岩石光谱
辽宁兴城
监督分类
小样本学习
孪生网络
Rock spectrum
Xingcheng liaoning
Supervised classification
Few-shot learning
Siamese neural network