利用预训练模型提升光谱特征提取性能的有效性研究

Research on Effectiveness of the Pre-Training Model in Improving the Performance of Spectral Feature Extraction

下载PDF

导出

摘要观测技术的发展带来了海量的光谱数据。如何对这些数据进行自动分类受到广大研究人员的关注,其关键是光谱数据的特征提取。鉴于人工处理方式的局限性,主流研究大多采用机器学习算法进行光谱数据的特征提取。然而,这些机器学习算法由于时空复杂度过高无法处理海量光谱数据。近年来涌现的预训练模型具有优良的特征提取能力,但目前鲜有文献对该模型对光谱数据有效性问题进行探讨。因此,将恒星光谱数据作为研究对象,分别引入BERT、ALBERT、GTP等预训练模型和卷积神经网络(CNN)来对恒星光谱数据进行特征提取和分类处理,通过比较实验结果来检验这几类预训练模型在恒星光谱特征提取方面的有效性。利用Python编程语言编写光谱分类程序。在预训练模型特征提取的基础上,利用TensorFlow1.14中的CNN模型进行光谱类型判定。实验用到的数据集是SDSS DR10恒星光谱数据集,包括K型、F型、G型。利用网格搜索和5倍交叉验证法获得实验最优参数。在相同训练数据集条件下,与ALBERT、GPT相比,BERT模型的分类正确率均最高。从平均正确率看,在K型、F型、G型恒星数据集上,BERT模型的平均正确率比ALBERT分别高0.0251、0.0215和0.0225,比GPT分别高0.0497、0.0424和0.0432。分析实验结果可以得出如下结论:(1)恒星光谱分类正确率随训练数据规模的增大而提高;(2)在训练数据规模占比相同的情况下,同一模型在K型恒星数据集上的分类正确率最高,其次是F型恒星数据集,G型恒星数据集最低;(3)与ALBERT、GPT相比,BERT模型具有具有更优的特征提取能力。 The development of observation technology has led to massive spectral data.How to automatically classify these data has received attention from researchers,the most important of which is feature extraction.Given the limitations of manual processing,most of the research uses machine learning algorithms to extract feature-based spectral data.However,these machine learning algorithms cannot handle massive spectral data due to the high spatial and temporal complexities.The pre-trained models emerging in recent years have excellent feature extraction capabilities.Still,there is little research on the effectiveness of such a model in the feature extraction of spectral data.Therefore,this paper takes the stellar spectral data as the research object separately introduces the pre-training models such as BERT,ALBERT,GTP,and Convolutional Neural Networks(CNN)for feature extraction and classification of the stellar spectral data,and tries to verify the effectiveness of these pre-training models for feature extraction of stellar spectral data by comparing the experimental results.Python programming language is used to write the spectral classification program.Based on the feature extraction of the pre-trained models,the CNN model in TensorFlow 1.14 is utilized for spectral data classification.The dataset used for the experiment is the SDSS DR10 stellar spectral dataset,including K-type,F-type,and G-type.The grid search and 5-fold cross-validation are utilized to obtain the experimental optimal parameters.The BERT model has the highest classification accuracies compared to ALBERT and GPT with the same experimental conditions.In terms of the average classification accuracies,the average classification accuracies of the BERT model are 0.0251,0.0215,and 0.0225 higher than that of ALBERT,and 0.0497,0.0424,and 0.0432 higher than that of GPT,on the K-type,F-type,and G-type stellar datasets.It is easy to draw the following conclusions by analyzing the experimental results:Firstly,the classification accuracies improve with the scale increase of training data;Secondly,the same model has the highest classification accuracies on the same training dataset of K-type stellar,followed by the F-type and the G-type;Thirdly,the BERT model has the best ability of feature extraction compared with ALBERT and GPT.

作者任菊香刘忠宝 REN Ju-xiang;LIU Zhong-bao(College of Information Engineering,Shanxi Vocational University of Engineering Science and Technology,Jinzhong 030619,China;School of Information Science,Beijing Language and Culture University,Beijing 100083,China)

机构地区山西工程科技职业大学信息工程学院北京语言大学信息科学学院

出处《光谱学与光谱分析》 SCIE EI CAS CSCD 北大核心 2024年第12期3480-3484,共5页 Spectroscopy and Spectral Analysis

基金国家自然科学基金项目(11803080)资助。

关键词海量光谱数据光谱特征提取预训练模型有效性验证 Massive spectral data Spectral feature extraction Pre-training model Validation of effectiveness

分类号 TP29 [自动化与计算机技术—检测技术与自动化装置]

引文网络
相关文献

参考文献2

1何东远,刘伟,曹硕,耿率博,刘宇婷,姚迦文.基于深度学习的恒星光谱分类[J].北京师范大学学报（自然科学版）,2020,56(1):37-44. 被引量：6
2姜斌,赵梓良,王淑婷,韦纪宇,曲美霞.基于t-SNE的恒星光谱降维与分类研究[J].光谱学与光谱分析,2020,40(9):2913-2917. 被引量：6

二级参考文献4

1苏定强,王亚男.大天区面积多目标光纤光谱望远镜(LAMOST)的跟踪运动[J].天体物理学报,1997,17(3):315-322. 被引量：4
2肖辉辉,段艳明.基于属性值相关距离的KNN算法的改进研究[J].计算机科学,2013,40(11A):157-159. 被引量：28
3施建荣.LAMOST望远镜[J].科学通报,2016,61(12):1330-1335. 被引量：10
4刘忠宝,任娟娟,宋文爱,张静,孔啸,富丽贞.基于熵学习机的恒星光谱分类（英文）[J].光谱学与光谱分析,2018,38(2):660-664. 被引量：1

共引文献10

1翟乃琦,云利军,叶志霞,卢琳.一种基于一维卷积神经网络的烟叶霉变预测方法[J].云南师范大学学报（自然科学版）,2021,41(3):23-27.
2周惠慧.恒星天文光谱数据分类方法探究[J].信息与电脑,2021,33(7):84-86. 被引量：1
3邓诗宇,刘承志,康喆,李振伟,刘德龙,张楠,朱成伟,牛炳力,陈龙,丁一高,姜平.基于偏差估计卷积神经网络恒星光谱数据自动分类[J].科学技术与工程,2021,21(16):6613-6618. 被引量：1
4金林彩,叶杰凯,张珍,汤小明,邵锡余,庹帅.基于DMD和t-SNE的液压泵故障诊断[J].机床与液压,2021,49(14):187-192.
5田青林,郭帮杰,叶发旺,李瑶,刘鹏飞,陈雪娇.一维空洞卷积神经网络的矿物光谱分类[J].光谱学与光谱分析,2022,42(3):873-877. 被引量：3
6范雅雯,刘艳萍,邱波,姜霞,王林倩,王坤.基于SSTransformer的恒星亚型光谱分类方法研究[J].光谱学与光谱分析,2023,43(8):2523-2528.
7刘鑫鹏,孙祥洪,秦玉华,张敏,宫会丽.基于Wasserstein散度的t-SNE相似性度量方法研究[J].光谱学与光谱分析,2023,43(12):3806-3812.
8李浩,赵青,崔辰州,樊东卫,张成奎,史艳翠,王嫄.基于CNN与LSTM复合深度模型的恒星光谱分类算法[J].光谱学与光谱分析,2024,44(6):1668-1675. 被引量：2
9王晓敏,高军萍,蒲源,邱波,张健楠,闫静,李荣.LAMOST的“Unknown”光谱分类研究:ODS-YOLOv7模型[J].光谱学与光谱分析,2024,44(7):1960-1967.
10翟立楠,李卓然,王永琪,刘松欣,王英杰,李红宇,郑桂萍.基于t-SNE法对寒地水稻种质资源品质的分类研究[J].种子,2024,43(11):78-85.

1孙中彬,刁宇轩,马苏洋.基于安全欠采样的不均衡多标签数据集成学习方法[J].电子学报,2024,52(10):3392-3408.
2陈振,张小青,周文娟.基于轻量网络的遥感影像建筑物提取[J].北京测绘,2024,38(9):1346-1351.
3马宝君,郑宝仁,郑嘉祺,李广泰,赵建通.黑龙江省地震预警台站实时监控软件研发[J].地震地磁观测与研究,2024,45(5):171-176.
4黄丹,张历卓,陈思羽,匡迎春.基于SVMD-Informer-XGBoost的风电功率区间预测[J].长江信息通信,2024,37(10):34-37.
5王锦红,蒋海昆.基于机器学习随机森林算法的地震序列类型判定研究[J].中国地震,2024,40(3):517-531.
6员丹青,贺利,闫鑫.基于深度学习算法的基坑工程变形预测模型研究[J].微型计算机,2024(11):157-159.
7熊红林,董明,陈宏民,孙琦,贺远珍.数据要素视角下基于KM-WRF模型的社会救助服务分类方法研究[J].工业工程与管理,2024,29(4):185-192.
8庞婷,吴伟莉,杨斌姣.维持性血液透析患者导管相关感染的病原菌情况、危险因素与预测模型构建[J].实用临床医药杂志,2024,28(21):60-65.
9丁自豪,雷川鹤,郑史雄,贾宏宇,陈志强,许智.锈蚀RC柱水平抗力的机器学习预测及参数敏感性分析[J].哈尔滨工业大学学报,2024,56(11):80-87.
10曹波,於帆,冯萌萌,卢洁.基于深度学习生成颈动脉高分辨磁共振增强图像的方法及临床应用研究[J].磁共振成像,2024,15(10):141-147.

光谱学与光谱分析

2024年第12期

浏览历史

内容加载中请稍等...

利用预训练模型提升光谱特征提取性能的有效性研究

参考文献2

二级参考文献4

共引文献10

相关作者

相关机构

相关主题

浏览历史