摘要
以可扩展性极强的开源软件R程序语言为工具,发挥在统计学和数据挖掘领域强大的数据分析能力,重点研究R语言用于读写FITS格式文件软件包RFITSIO的主要功能和特点,并对LOMAST采集的FITS文件进行详细介绍,将海量LOMAST巡天光谱DR2数据用RFITSIO读出恒星光谱,并利用R语言的主成分分析工具提取各类型光谱数据的特征量即主成分。从含有大量冗余信息的光谱中提取代表恒星光谱特征的主要成分,通过采用主成分分析方法提取光谱特征,重构后能够有效降低原始光谱数据受噪声的影响,为后续数据挖掘工作提供研究基础。
The data mining research of large-scale survey is focused on handling, processing and extracting information from massive astronomical data. In this paper, we try to apply the extensible R programming language in LAMOST spectral analysis, and make full use of its capability of integrated data analysis and visualization methods. We mainly study the functions and characteristics of the RFITSIO package for reading and writing FITS format files in R. We then group the LAMOST DR2 data according to the released classification result, and the PCA package in R is applied in each group to extract spectral features from the large amount of noisy spectra. The result shows that, the spectral features are well kept through PCA reconstruction. By extracting the FLUX eigenvalues of the spectral signal description capability of each band in the spectrum, the PCA is used to extract the characteristic value of LAMOST. Rotating coordinate system to eliminate the correlation between the characteristics of the spectral resolution of the data, to reduce the dimensionality of data and remove the effect of noise. This dimensional reduction based feature extraction method can be a very efficient pre-processing approach for the follow-up data mining in LAMOST dataset.
作者
陈淑鑫
罗阿理
孙伟民
Chen Shuxin Luo Ali Sun Weimin(College of Mechanical and Electrical Engineering of Qiqihar University, Qiqihar 161006, China Key Lab of In-fiber Integrated Optics, Ministry Education of China, Harbin Engineering University, Harbin 150006, China Key Laboratory of Optical Astronomy, Chinese Academy of Sciences, Beijing 100012, China)
出处
《天文研究与技术》
CSCD
2017年第3期363-368,共6页
Astronomical Research & Technology
基金
国家自然科学基金(U1631239)
黑龙江省教育厅基本科研业务专项(135109219)
齐齐哈尔市科学技术计划工业攻关项目(GYGG-201518)
齐齐哈尔大学教育科学研究项目(2016072)资助
关键词
R语言
FITSIO
光谱巡天
LAMOST
主成分分析
R language
Flexible Image Transport System Input Output
Spectroscopic Survey
Large Sky Area Multi-Object Fiber Spectroscopy Telescope
Principal Component Analysis