DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and...DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and domains,particularly in conjunction with implementing the FAIR data principles(forfindable,accessible,interoperable and reusable data).These implementation experiences,besides demonstrating the fitness of DCAT to meet its intended purpose,helped identify existing issues and gaps.Moreover,over the last few years,additional requirements emerged in data catalogs,given the increasing practice of documenting not only datasets but also data services and APls.This paper illustrates the new version of DCAT,explaining the rationale behind its main revisions and extensions,based on the collected use cases and requirements,and outlines the issues yet to be addressed in future versions of DCAT.展开更多
The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of ...The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.展开更多
The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) published its first data release (DR1) in 2013, which is currently the largest dataset of stellar spectra in the world. We combine the PASTEL ...The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) published its first data release (DR1) in 2013, which is currently the largest dataset of stellar spectra in the world. We combine the PASTEL catalog and SIMBAD radial velocities as a testing standard to validate stellar parameters (effec- tive temperature Tefr, surface gravity log g, metallicity [Fe/H] and radial velocity Vr) derived from DR1. Through cross-identification of the DR1 catalogs and the PASTEL catalog, we obtain a preliminary sample of 422 stars. After removal of stellar param- eter measurements from problematic spectra and applying effective temperature con- straints to the sample, we compare the stellar parameters from DR1 with those from PASTEL and SIMBAD to demonstrate that the DR1 results are reliable in restricted ranges of Tefr. We derive standard deviations of 110 K, 0.19 dex and 0.11 dex for Tell, log 9 and [Fe/H] respectively when Teff〈 8000 K, and 4.91 km s-1 for Vr when Teff 〈 10 000 K. Systematic errors are negligible except for those of Vr. In addition, metallicities in DR1 are systematically higher than those in PASTEL, in the range of PASTEL [Fe/H] 〈 -1.5.展开更多
We introduced a decision tree method called Random Forests for multiwavelength data classification. The data were adopted from different databases, including the Sloan Digital Sky Survey (SDSS) Data Release five, US...We introduced a decision tree method called Random Forests for multiwavelength data classification. The data were adopted from different databases, including the Sloan Digital Sky Survey (SDSS) Data Release five, USNO, FIRST and ROSAT. We then studied the discrimination of quasars from stars and the classification of quasars, stars and galaxies with the sample from optical and radio bands and with that from optical and X-ray bands. Moreover, feature selection and feature weighting based on Random Forests were investigated. The performances based on different input patterns were compared. The experimental results show that the random forest method is an effective method for astronomical object classification and can be applied to other classification problems faced in astronomy. In addition, Random Forests will show its superiorities due to its own merits, e.g. classification, feature selection, feature weighting as well as outlier detection.展开更多
We compare the performance of Bayesian Belief Networks (BBN), Multilayer Perception (MLP) networks and Alternating Decision Trees (ADtree) on separating quasars from stars with the database from the 2MASS and FI...We compare the performance of Bayesian Belief Networks (BBN), Multilayer Perception (MLP) networks and Alternating Decision Trees (ADtree) on separating quasars from stars with the database from the 2MASS and FIRST survey catalogs. Having a training sample of sources of known object types, the classifiers are trained to separate quasars from stars. By the statistical properties of the sample, the features important for classifica- tion are selected. We compare the classification results with and without feature selection. Experiments show that the results with feature selection are better than those without feature selection. From the high accuracy found, it is concluded that these automated methods are robust and effective for classifying point sources. They may all be applied to large survey projects (e.g. selecting input catalogs) and for other astronomical issues, such as the parameter measurement of stars and the redshift estimation of galaxies and quasars.展开更多
Machine learning has increasingly gained more popularity with its incredibly powerful ability to make predictions or calculate suggestions for large amounts of data. We apply machine learning clas-sification to 85 613...Machine learning has increasingly gained more popularity with its incredibly powerful ability to make predictions or calculate suggestions for large amounts of data. We apply machine learning clas-sification to 85 613 922 objects in the Gala Data Release 2, based on a combination of Pan-STARRS I and AI1WISE data. The classification results are cross-matched with the Simbad database, and the to-tal accuracy is 91.9%. Our sample is dominated by stars, ~98%, and galaxies make up 2%. For the objects with negative parallaxes, about 2.5% are galaxies and QSOs, while about 99.9% are stars if the relative parallax uncertainties are smaller than 0.2. Our result implies that using the threshold of 0 〈 σπ/π 〈 0.2 could yield a very clean stellar sample.展开更多
基金partially supported by TAILOR, a project funded by EU Horizon 2020 research and innovation programme under GA No 952215funded by refinitiv.com (previously Thomson Reuters)
文摘DCAT is an RDF vocabulary designed to facilitate interoperability between data catalogs published on the Web.Since its first release in 2014 as a W3C Recommendation,DCAT has seen a wide adoption across communities and domains,particularly in conjunction with implementing the FAIR data principles(forfindable,accessible,interoperable and reusable data).These implementation experiences,besides demonstrating the fitness of DCAT to meet its intended purpose,helped identify existing issues and gaps.Moreover,over the last few years,additional requirements emerged in data catalogs,given the increasing practice of documenting not only datasets but also data services and APls.This paper illustrates the new version of DCAT,explaining the rationale behind its main revisions and extensions,based on the collected use cases and requirements,and outlines the issues yet to be addressed in future versions of DCAT.
基金funded by the National Basic Research Program of China (973 Program, 2014CB845700)the National Natural Science Foundation of China (Grant Nos. 11390371)Funding for the project has been provided by the National Development and Reform Commission
文摘The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.
基金supported by the National Key Basic Research Program of China (NKBRP) 2014CB845700supported by National Natural Science Foundation of China (Grant Nos.11473001 and 11233004)
文摘The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) published its first data release (DR1) in 2013, which is currently the largest dataset of stellar spectra in the world. We combine the PASTEL catalog and SIMBAD radial velocities as a testing standard to validate stellar parameters (effec- tive temperature Tefr, surface gravity log g, metallicity [Fe/H] and radial velocity Vr) derived from DR1. Through cross-identification of the DR1 catalogs and the PASTEL catalog, we obtain a preliminary sample of 422 stars. After removal of stellar param- eter measurements from problematic spectra and applying effective temperature con- straints to the sample, we compare the stellar parameters from DR1 with those from PASTEL and SIMBAD to demonstrate that the DR1 results are reliable in restricted ranges of Tefr. We derive standard deviations of 110 K, 0.19 dex and 0.11 dex for Tell, log 9 and [Fe/H] respectively when Teff〈 8000 K, and 4.91 km s-1 for Vr when Teff 〈 10 000 K. Systematic errors are negligible except for those of Vr. In addition, metallicities in DR1 are systematically higher than those in PASTEL, in the range of PASTEL [Fe/H] 〈 -1.5.
基金Supported by the National Natural Science Foundation of ChinaThis paper is funded by the National Natural Science Foundation of China under grant under GrantNos. 10473013, 90412016 and 10778724 by the 863 project under Grant No. 2006AA01A120
文摘We introduced a decision tree method called Random Forests for multiwavelength data classification. The data were adopted from different databases, including the Sloan Digital Sky Survey (SDSS) Data Release five, USNO, FIRST and ROSAT. We then studied the discrimination of quasars from stars and the classification of quasars, stars and galaxies with the sample from optical and radio bands and with that from optical and X-ray bands. Moreover, feature selection and feature weighting based on Random Forests were investigated. The performances based on different input patterns were compared. The experimental results show that the random forest method is an effective method for astronomical object classification and can be applied to other classification problems faced in astronomy. In addition, Random Forests will show its superiorities due to its own merits, e.g. classification, feature selection, feature weighting as well as outlier detection.
基金Supported by the National Natural Science Foundation of China.
文摘We compare the performance of Bayesian Belief Networks (BBN), Multilayer Perception (MLP) networks and Alternating Decision Trees (ADtree) on separating quasars from stars with the database from the 2MASS and FIRST survey catalogs. Having a training sample of sources of known object types, the classifiers are trained to separate quasars from stars. By the statistical properties of the sample, the features important for classifica- tion are selected. We compare the classification results with and without feature selection. Experiments show that the results with feature selection are better than those without feature selection. From the high accuracy found, it is concluded that these automated methods are robust and effective for classifying point sources. They may all be applied to large survey projects (e.g. selecting input catalogs) and for other astronomical issues, such as the parameter measurement of stars and the redshift estimation of galaxies and quasars.
基金supported by the National Program on Key Research and Development Project (Grant No.2016YFA0400804)the National Natural Science Foundation of China (Grant Nos.11603038,11333004,11425313 and 11403056)
文摘Machine learning has increasingly gained more popularity with its incredibly powerful ability to make predictions or calculate suggestions for large amounts of data. We apply machine learning clas-sification to 85 613 922 objects in the Gala Data Release 2, based on a combination of Pan-STARRS I and AI1WISE data. The classification results are cross-matched with the Simbad database, and the to-tal accuracy is 91.9%. Our sample is dominated by stars, ~98%, and galaxies make up 2%. For the objects with negative parallaxes, about 2.5% are galaxies and QSOs, while about 99.9% are stars if the relative parallax uncertainties are smaller than 0.2. Our result implies that using the threshold of 0 〈 σπ/π 〈 0.2 could yield a very clean stellar sample.