Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signa...Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a survey.However,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real pulsars.Class imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are misjudged.To deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature selection.Feature selection is a process of selecting a subset of the most relevant features from a feature pool.The distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly imbalanced.In this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is designed.KFRG is a two-stage algorithm.In the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search strategy.Experiments on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.展开更多
With the construction of large telescopes and the explosive growth of observed galaxy data,we are facing the problem to improve the data processing efficiency while ensuring the accuracy of galaxy morphology classific...With the construction of large telescopes and the explosive growth of observed galaxy data,we are facing the problem to improve the data processing efficiency while ensuring the accuracy of galaxy morphology classification.Therefore,this work designed a lightweight deep learning framework,Efficient Net-G3,for galaxy morphology classification.The proposed framework is based on Efficient Net which integrates the Efficient Neural Architecture Search algorithm.Its performance is assessed with the data set from the Galaxy Zoo Challenge Project on Kaggle.Compared with several typical neural networks and deep learning frameworks in galaxy morphology classification,the proposed Efficient Net-G3 model improved the classification accuracy from 95.8% to 96.63% with F1-Score values of 97.1%.Typically,this model uses the least number of parameters,which is about one tenth that of DenseNet161 and one fifth that of ResNet-26,but its accuracy is about one percent higher than them.The proposed Efficient Net-G3 can act as an important reference for fast morphological classification for massive galaxy data in terms of efficiency and accuracy.展开更多
Machine learning has become a crucial technique for classifying the morphology of galaxies as a result of the meteoric development of galactic data.Unfortunately,traditional supervised learning has significant learnin...Machine learning has become a crucial technique for classifying the morphology of galaxies as a result of the meteoric development of galactic data.Unfortunately,traditional supervised learning has significant learning costs since it needs a lot of labeled data to be effective.FixMatch,a semi-supervised learning algorithm that serves as a good method,is now a key tool for using large amounts of unlabeled data.Nevertheless,the performance degrades significantly when dealing with large,imbalanced data sets since FixMatch relies on a fixed threshold to filter pseudo-labels.Therefore,this study proposes a dynamic threshold alignment algorithm based on the FixMatch model.First,the class with the highest amount has its reliable pseudo-label ratio determined,and the remaining classes'reliable pseudo-label ratios are approximated in accordance.Second,based on the predicted reliable pseudo-label ratio for each category,it dynamically calculates the threshold for choosing pseudo-labels.By employing this dynamic threshold,the accuracy bias of each category is decreased and the learning of classes with less samples is improved.Experimental results show that in galaxy morphology classification tasks,compared with supervised learning,the proposed algorithm significantly improves performance.When the amount of labeled data is 100,the accuracy and F1-score are improved by 12.8%and 12.6%,respectively.Compared with popular semisupervised algorithms such as FixMatch and MixMatch,the proposed algorithm has better classification performance,greatly reducing the accuracy bias of each category.When the amount of labeled data is 1000,the accuracy of cigar-shaped smooth galaxies with the smallest sample is improved by 37.94%compared to FixMatch.展开更多
The accuracy of the estimated stellar atmospheric parameter evidently decreases with the decreasing of spectral signal-to-noise ratio(S/N)and there are a huge amount of this kind observations,especially in case of S/N...The accuracy of the estimated stellar atmospheric parameter evidently decreases with the decreasing of spectral signal-to-noise ratio(S/N)and there are a huge amount of this kind observations,especially in case of S/N<30.Therefore,it is helpful to improve the parameter estimation performance for these spectra and this work studied the(T_(eff),log g,[Fe/H])estimation problem for LAMOST DR8 low-resolution spectra with 20≤S/N<30.We proposed a data-driven method based on machine learning techniques.First,this scheme detected stellar atmospheric parameter-sensitive features from spectra by the Least Absolute Shrinkage and Selection Operator(LASSO),rejected ineffective data components and irrelevant data.Second,a Multi-layer Perceptron(MLP)method was used to estimate stellar atmospheric parameters from the LASSO features.Finally,the performance of the LASSO-MLP was evaluated by computing and analyzing the consistency between its estimation and the reference from the Apache Point Observatory Galactic Evolution Experiment high-resolution spectra.Experiments show that the Mean Absolute Errors of T_(eff),log g,[Fe/H]are reduced from the LASP(137.6 K,0.195,0.091 dex)to LASSO-MLP(84.32 K,0.137,0.063 dex),which indicate evident improvements on stellar atmospheric parameter estimation.In addition,this work estimated the stellar atmospheric parameters for 1,162,760 lowresolution spectra with 20≤S/N<30 from LAMOST DR8 using LASSO-MLP,and released the estimation catalog,learned model,experimental code,trained model,training data and test data for scientific exploration and algorithm study.展开更多
基金support from the National Natural Science Foundation of China(NSFC,grant Nos.11973022 and 12373108)the Natural Science Foundation of Guangdong Province(No.2020A1515010710)Hanshan Normal University Startup Foundation for Doctor Scientific Research(No.QD202129)。
文摘Pulsar detection has become an active research topic in radio astronomy recently.One of the essential procedures for pulsar detection is pulsar candidate sifting(PCS),a procedure for identifying potential pulsar signals in a survey.However,pulsar candidates are always class-imbalanced,as most candidates are non-pulsars such as RFI and only a tiny part of them are from real pulsars.Class imbalance can greatly affect the performance of machine learning(ML)models,resulting in a heavy cost as some real pulsars are misjudged.To deal with the problem,techniques of choosing relevant features to discriminate pulsars from non-pulsars are focused on,which is known as feature selection.Feature selection is a process of selecting a subset of the most relevant features from a feature pool.The distinguishing features between pulsars and non-pulsars can significantly improve the performance of the classifier even if the data are highly imbalanced.In this work,an algorithm for feature selection called the K-fold Relief-Greedy(KFRG)algorithm is designed.KFRG is a two-stage algorithm.In the first stage,it filters out some irrelevant features according to their K-fold Relief scores,while in the second stage,it removes the redundant features and selects the most relevant features by a forward greedy search strategy.Experiments on the data set of the High Time Resolution Universe survey verified that ML models based on KFRG are capable of PCS,correctly separating pulsars from non-pulsars even if the candidates are highly class-imbalanced.
基金supported by the National Natural Science Foundation of China(NSFC,Grant Nos.11973022 and U1811464)the Natural Science Foundation of Guangdong Province(No.2020A1515010710)。
文摘With the construction of large telescopes and the explosive growth of observed galaxy data,we are facing the problem to improve the data processing efficiency while ensuring the accuracy of galaxy morphology classification.Therefore,this work designed a lightweight deep learning framework,Efficient Net-G3,for galaxy morphology classification.The proposed framework is based on Efficient Net which integrates the Efficient Neural Architecture Search algorithm.Its performance is assessed with the data set from the Galaxy Zoo Challenge Project on Kaggle.Compared with several typical neural networks and deep learning frameworks in galaxy morphology classification,the proposed Efficient Net-G3 model improved the classification accuracy from 95.8% to 96.63% with F1-Score values of 97.1%.Typically,this model uses the least number of parameters,which is about one tenth that of DenseNet161 and one fifth that of ResNet-26,but its accuracy is about one percent higher than them.The proposed Efficient Net-G3 can act as an important reference for fast morphological classification for massive galaxy data in terms of efficiency and accuracy.
基金supported by China Manned Space Program through its Space Application Systemthe National Natural Science Foundation of China(NSFC,grant Nos.11973022 and U1811464)the Natural Science Foundation of Guangdong Province(No.2020A1515010710)。
文摘Machine learning has become a crucial technique for classifying the morphology of galaxies as a result of the meteoric development of galactic data.Unfortunately,traditional supervised learning has significant learning costs since it needs a lot of labeled data to be effective.FixMatch,a semi-supervised learning algorithm that serves as a good method,is now a key tool for using large amounts of unlabeled data.Nevertheless,the performance degrades significantly when dealing with large,imbalanced data sets since FixMatch relies on a fixed threshold to filter pseudo-labels.Therefore,this study proposes a dynamic threshold alignment algorithm based on the FixMatch model.First,the class with the highest amount has its reliable pseudo-label ratio determined,and the remaining classes'reliable pseudo-label ratios are approximated in accordance.Second,based on the predicted reliable pseudo-label ratio for each category,it dynamically calculates the threshold for choosing pseudo-labels.By employing this dynamic threshold,the accuracy bias of each category is decreased and the learning of classes with less samples is improved.Experimental results show that in galaxy morphology classification tasks,compared with supervised learning,the proposed algorithm significantly improves performance.When the amount of labeled data is 100,the accuracy and F1-score are improved by 12.8%and 12.6%,respectively.Compared with popular semisupervised algorithms such as FixMatch and MixMatch,the proposed algorithm has better classification performance,greatly reducing the accuracy bias of each category.When the amount of labeled data is 1000,the accuracy of cigar-shaped smooth galaxies with the smallest sample is improved by 37.94%compared to FixMatch.
基金supported by the National Natural Science Foundation of China(grant Nos.11973022,11973049,and U1811464)the Natural Science Foundation of Guangdong Province(No.2020A1515010710)the Youth Innovation Promotion Association of the CAS(id.Y202017)。
文摘The accuracy of the estimated stellar atmospheric parameter evidently decreases with the decreasing of spectral signal-to-noise ratio(S/N)and there are a huge amount of this kind observations,especially in case of S/N<30.Therefore,it is helpful to improve the parameter estimation performance for these spectra and this work studied the(T_(eff),log g,[Fe/H])estimation problem for LAMOST DR8 low-resolution spectra with 20≤S/N<30.We proposed a data-driven method based on machine learning techniques.First,this scheme detected stellar atmospheric parameter-sensitive features from spectra by the Least Absolute Shrinkage and Selection Operator(LASSO),rejected ineffective data components and irrelevant data.Second,a Multi-layer Perceptron(MLP)method was used to estimate stellar atmospheric parameters from the LASSO features.Finally,the performance of the LASSO-MLP was evaluated by computing and analyzing the consistency between its estimation and the reference from the Apache Point Observatory Galactic Evolution Experiment high-resolution spectra.Experiments show that the Mean Absolute Errors of T_(eff),log g,[Fe/H]are reduced from the LASP(137.6 K,0.195,0.091 dex)to LASSO-MLP(84.32 K,0.137,0.063 dex),which indicate evident improvements on stellar atmospheric parameter estimation.In addition,this work estimated the stellar atmospheric parameters for 1,162,760 lowresolution spectra with 20≤S/N<30 from LAMOST DR8 using LASSO-MLP,and released the estimation catalog,learned model,experimental code,trained model,training data and test data for scientific exploration and algorithm study.