Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers ...Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.展开更多
The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of ...The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.展开更多
Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the u...Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.展开更多
This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information abo...This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information about the FITS headers of spectra are also introduced. The released data set includes 319 000 spectra and a catalog of these objects.展开更多
The atomic time scale release system for multiple laboratories is completed by modular design according to the atomic clock data provided by eight domestic punctual laboratories.The system includes the three modules,t...The atomic time scale release system for multiple laboratories is completed by modular design according to the atomic clock data provided by eight domestic punctual laboratories.The system includes the three modules,the processing of atomic clock data,the calculation of atomic time scale and the release of atomic time scale data,using MATLAB for data processing and time scale calculation,and using GUI for data visualization design.The system has clear process of the algorithm,simple function modules and friendly human-machine interface.The operation results of actual data show that the time difference between the integrated atomic time scale of the system and UTC is better than±10ns,and the content of data release can meet the needs of the scientific research in related fields in China.展开更多
Differential privacy(DP)is widely employed for the private data release in the single-party scenario.Data utility could be degraded with noise generated by ubiquitous data correlation,and it is often addressed by sens...Differential privacy(DP)is widely employed for the private data release in the single-party scenario.Data utility could be degraded with noise generated by ubiquitous data correlation,and it is often addressed by sensitivity reduction with correlation analysis.However,increasing multiparty data release applications present new challenges for existing methods.In this paper,we propose a novel correlated differential privacy of the multiparty data release(MP-CRDP).It effectively reduces the merged dataset's dimensionality and correlated sensitivity in two steps to optimize the utility.We also propose a multiparty correlation analysis technique.Based on the prior knowledge of multiparty data,a more reasonable and rigorous standard is designed to measure the correlated degree,reducing correlated sensitivity,and thus improve the data utility.Moreover,by adding noise to the weights of machine learning algorithms and query noise to the release data,MP-CRDP provides the release technology for both low-noise private data and private machine learning algorithms.Comprehensive experiments demonstrate the effectiveness and practicability of the proposed method on the utilized Adult and Breast Cancer datasets.展开更多
Objective:This paper aims to establish a simple and practical method for the rapid detection of ammonia nitrogen in water on the spot,and to transmit the detection results to the Internet via GSM network,and to publis...Objective:This paper aims to establish a simple and practical method for the rapid detection of ammonia nitrogen in water on the spot,and to transmit the detection results to the Internet via GSM network,and to publish and update them in real time.Methods:Phenol salt colorimetry was used to measure the absorbance value of indigo phenol blue,the product of the reaction between ammonia nitrogen and phenol salt in water samples,using sodium nitrosoferricyanide sodium hydroxide as catalyst and a microphotoelectric colorimetric instrument developed by ourselves,or a simple visual colorimetric semi-quantitative method was used to measure the ammonia nitrogen content in water samples.Then,the general GSM wireless communication module built in the microphotoelectric colorimeter was used to realize the remote transmission of the test result data and the real-time update and release on the Internet.Results:The results of this method showed that the correlation of the method was significant,and the precision and accuracy were similar to the national standard Nessler's reagent spectrophotometry.The relative standard deviation is 4.4%,and the relative error is 2.7%.In 5-10 min,the detection of ammonia and nitrogen in single water sample can be completed on site,and the results can be released.For quantitative and semi-quantitative detection,the lowest detection concentrations are 0.05 mg/L and 0.2 mg/L,respectively,which are basically free from the interference of pH and common ions.In the coverage area of GSM network,the wireless transmission of data results was unobstructed and delayed,and the effect was satisfactory.Conclusion:The method is simple,rapid,practical and reliable,which is suitable for the field rapid determination of ammonia nitrogen in water,and the real-time remote transmission of the detection results.It provides a high-efficiency,low-cost and simple technical means for the field water quality monitoring and the rapid acquisition of water quality data.展开更多
基金partly supported by the National Natural Science Foundation of China (Nos. 61632010 and 61602129)
文摘Privacy preserving data releasing is an important problem for reconciling data openness with individual privacy. The state-of-the-art approach for privacy preserving data release is differential privacy, which offers powerful privacy guarantee without confining assumptions about the background knowledge about attackers. For genomic data with huge-dimensional attributes, however, current approaches based on differential privacy are not effective to handle. Specifically, amount of noise is required to be injected to genomic data with tens of million of SNPs (Single Nucleotide Polymorphisms), which would significantly degrade the utility of released data. To address this problem, this paper proposes a differential privacy guaranteed genomic data releasing method. Through executing belief propagation on factor graph, our method can factorize the distribution of sensitive genomic data into a set of local distributions. After injecting differential-privacy noise to these local distributions, synthetic sensitive data can be obtained by sampling on noise distribution. Synthetic sensitive data and factor graph can be further used to construct approximate distribution of non-sensitive data. Finally, non-sensitive genomic data is sampled from the approximate distribution to construct a synthetic genomic dataset.
基金funded by the National Basic Research Program of China (973 Program, 2014CB845700)the National Natural Science Foundation of China (Grant Nos. 11390371)Funding for the project has been provided by the National Development and Reform Commission
文摘The Large sky Area Multi-Object Fiber Spectroscopic Telescope(LAMOST) general survey is a spectroscopic survey that will eventually cover approximately half of the celestial sphere and collect 10 million spectra of stars, galaxies and QSOs. Objects in both the pilot survey and the first year regular survey are included in the LAMOST DR1. The pilot survey started in October 2011 and ended in June 2012, and the data have been released to the public as the LAMOST Pilot Data Release in August 2012. The regular survey started in September 2012, and completed its first year of operation in June 2013. The LAMOST DR1 includes a total of 1202 plates containing 2 955 336 spectra, of which 1 790 879 spectra have observed signalto-noise ratio(SNR) ≥ 10. All data with SNR ≥ 2 are formally released as LAMOST DR1 under the LAMOST data policy. This data release contains a total of 2 204 696 spectra, of which 1 944 329 are stellar spectra, 12 082 are galaxy spectra and 5017 are quasars. The DR1 not only includes spectra, but also three stellar catalogs with measured parameters: late A,FGK-type stars with high quality spectra(1 061 918 entries), A-type stars(100 073 entries), and M-type stars(121 522 entries). This paper introduces the survey design, the observational and instrumental limitations, data reduction and analysis, and some caveats. A description of the FITS structure of spectral files and parameter catalogs is also provided.
基金supported in part by National Natural Science Foundation of China(No.61672106)in part by Natural Science Foundation of Beijing,China(L192023)in part by the project of promoting the Classified Development of Beijing Information Science and Technology University(No.5112211038,5112211039)。
文摘Health monitoring data or the data about infectious diseases such as COVID-19 may need to be constantly updated and dynamically released,but they may contain user's sensitive information.Thus,how to preserve the user's privacy before their release is critically important yet challenging.Differential Privacy(DP)is well-known to provide effective privacy protection,and thus the dynamic DP preserving data release was designed to publish a histogram to meet DP guarantee.Unfortunately,this scheme may result in high cumulative errors and lower the data availability.To address this problem,in this paper,we apply Jensen-Shannon(JS)divergence to design the OPTICS(Ordering Points To Identify The Clustering Structure)scheme.It uses JS divergence to measure the difference between the updated data set at the current release time and private data set at the previous release time.By comparing the difference with a threshold,only when the difference is greater than the threshold,can we apply OPTICS to publish DP protected data sets.Our experimental results show that the absolute errors and average relative errors are significantly lower than those existing works.
文摘This paper describes the data release of the LAMOST pilot survey, which includes data reduction, calibration, spectral analysis, data products and data access. The accuracy of the released data and the information about the FITS headers of spectra are also introduced. The released data set includes 319 000 spectra and a catalog of these objects.
文摘The atomic time scale release system for multiple laboratories is completed by modular design according to the atomic clock data provided by eight domestic punctual laboratories.The system includes the three modules,the processing of atomic clock data,the calculation of atomic time scale and the release of atomic time scale data,using MATLAB for data processing and time scale calculation,and using GUI for data visualization design.The system has clear process of the algorithm,simple function modules and friendly human-machine interface.The operation results of actual data show that the time difference between the integrated atomic time scale of the system and UTC is better than±10ns,and the content of data release can meet the needs of the scientific research in related fields in China.
基金supported by the National Natural Science Foundation of China under Grant Nos.62102074 and 62032013the Liaoning Revitalization Talents Program under Grant No.XLYC1902010+1 种基金the Natural Science Foundation of Liaoning Province of China under Grant No.2020-MS-091Fundamental Research Funds for the Central Universities of China under Grant No.N2017015.
文摘Differential privacy(DP)is widely employed for the private data release in the single-party scenario.Data utility could be degraded with noise generated by ubiquitous data correlation,and it is often addressed by sensitivity reduction with correlation analysis.However,increasing multiparty data release applications present new challenges for existing methods.In this paper,we propose a novel correlated differential privacy of the multiparty data release(MP-CRDP).It effectively reduces the merged dataset's dimensionality and correlated sensitivity in two steps to optimize the utility.We also propose a multiparty correlation analysis technique.Based on the prior knowledge of multiparty data,a more reasonable and rigorous standard is designed to measure the correlated degree,reducing correlated sensitivity,and thus improve the data utility.Moreover,by adding noise to the weights of machine learning algorithms and query noise to the release data,MP-CRDP provides the release technology for both low-noise private data and private machine learning algorithms.Comprehensive experiments demonstrate the effectiveness and practicability of the proposed method on the utilized Adult and Breast Cancer datasets.
文摘Objective:This paper aims to establish a simple and practical method for the rapid detection of ammonia nitrogen in water on the spot,and to transmit the detection results to the Internet via GSM network,and to publish and update them in real time.Methods:Phenol salt colorimetry was used to measure the absorbance value of indigo phenol blue,the product of the reaction between ammonia nitrogen and phenol salt in water samples,using sodium nitrosoferricyanide sodium hydroxide as catalyst and a microphotoelectric colorimetric instrument developed by ourselves,or a simple visual colorimetric semi-quantitative method was used to measure the ammonia nitrogen content in water samples.Then,the general GSM wireless communication module built in the microphotoelectric colorimeter was used to realize the remote transmission of the test result data and the real-time update and release on the Internet.Results:The results of this method showed that the correlation of the method was significant,and the precision and accuracy were similar to the national standard Nessler's reagent spectrophotometry.The relative standard deviation is 4.4%,and the relative error is 2.7%.In 5-10 min,the detection of ammonia and nitrogen in single water sample can be completed on site,and the results can be released.For quantitative and semi-quantitative detection,the lowest detection concentrations are 0.05 mg/L and 0.2 mg/L,respectively,which are basically free from the interference of pH and common ions.In the coverage area of GSM network,the wireless transmission of data results was unobstructed and delayed,and the effect was satisfactory.Conclusion:The method is simple,rapid,practical and reliable,which is suitable for the field rapid determination of ammonia nitrogen in water,and the real-time remote transmission of the detection results.It provides a high-efficiency,low-cost and simple technical means for the field water quality monitoring and the rapid acquisition of water quality data.