摘要
天文光谱线指数数据能够较好地保留着恒星的物理特征信息,为此借助线指数特征数据构建多参数模型,有利于更好地回归分析数据的共变关系及谱线的内在规律。世界上光谱获取率最高的施密特天文望远镜LAMOST发布的观测光谱都已经过标记,利用天文可视化工具分析这些标记的恒星光谱线指数会产生预测因子自相关,多元线性回归时因变量存在共线性,导致方差较大、得到最小二乘回归系数不稳定,虽不影响使用回归的有效性,但较难从回归方程中得到独立预测因子的评估系数。利用LAMOST巡天光谱数据中A型恒星Lick线指数为数据源,选取有效温度 T eff 为7 000~8 500 K,取信噪比大于50的光谱特征值实现回归分析恒星参数 T eff 值,经箱线图呈现DR5星表中, A型光谱86 097条具备 T eff 值大样本光谱数据的整体分布,统计分析26种线指数的特征值后,选取分布相似且带宽为12 的kp12, halpha12和hgamma12字段,减少解释线指数变量的数目,优化冗余变量方差膨胀因子(VIF)系数。实验选取两两变量间观测数据集,局部拟合回归散点、同样的数据源使用散点图的总体轮廓生成高密度散点图,利用色差透明性突出显示数据密集区域。结果表明多元线性回归和岭回归算法都能从低分辨率光谱中确定A型恒星的有效温度,但经过共线性数据分析有偏估计实验,使用岭回归分析寻找最佳模型,能更准确地确定恒星有效温度,进而得到预测A型恒星有效温度及谱线回归特性。
Line index is widely used in describing the features of spectral lines for astronomical objects because it retains the main physical characteristic information of these objects.Based on line index,a multi-parameter model for regression analysis could be used to uncover co-variation relationship of data and the inherent laws of spectral lines.The observed spectra released by LAMOST,which has the highest spectra acquisition capability,provide us with real data for establishing a robust regression model.The multivariate linear regression was applied to get the co-linearity of the dependent variables,however,it resulted in large variance.It is unstable to obtain the least squares regression coefficient sometimes.Especially,it’s difficult for the multivariate linear regression to obtain the evaluation coefficient of independent predictor from the regression equation.In this paper,we use the A-type stellar Lick line index in the LAMOST survey data as the data source.Selecting the spectra with effective temperature ( T eff ) from 7 000 to 8 500 K,and the signal-to-noise ratio higher than 50 to realize the regression analysis.After a set of linear biased estimation experiment for A-type stars,the method of ridge regression training was employed.In the catalogue of LAMOST data release 5 (DR5),86 097 A-type spectra have provided the T eff value.After statistical analysis of the eigenvalues of 26 line indices,the kp12,halpha12 and hgamma12 with similar distribution and bandwidth of 12 were selected to reduce the data redundance.The number of variety was optimized for the redundant variable variance expansion factor (VIF) coefficient.Two regression experiments selected the same observation dataset to locally fit the regression scatter,using the overall contour of the scatter plot to generate a high-density scatter plot,highlighting the data-intensive region with the color difference transparency.The results show that both the multiple linear regression and the ridge regression algorithm can determine the effective temperature ( T eff ) of the A-type star through the low-resolution spectrum,but the co-linearity data analysis has some biased estimation.The ridge regression model can more accurately predict the effective temperature of A type stars from the low resolution spectra.
作者
薛仁政
陈淑鑫
黄宏本
XUE Ren-zheng;CHEN Shu-xin;HUANG Hong-ben(School of Computer and Control Engineering,Qiqihar University,Qiqihar 161006,China;School of Data Science and Software Engineering,Wuzhou University,Wuzhou 543002,China)
出处
《光谱学与光谱分析》
SCIE
EI
CAS
CSCD
北大核心
2019年第8期2624-2629,共6页
Spectroscopy and Spectral Analysis
基金
国家自然科学基金项目(U1631239)
国家自然科学基金青年科学基金项目(11803013)
黑龙江省教育厅基本业务专项项目(135109248)
齐齐哈尔市科技计划项目(GYGG-201720)
关键词
恒星光谱
LAMOST
岭回归
线性模型
Lick线指数
Stellar spectra
LAMOST (Large sky area multi-object fiber spectroscopy telescope)
Ridge regression
Linear model
Lick line index