Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Proj...Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.展开更多
This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both t...This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.展开更多
Based on the coherent interaction and action–counteraction principles,we investigate the ground state properties for small polaron systems,the coherent-squeezed fluctuation correction,and the anomalous lattice quantu...Based on the coherent interaction and action–counteraction principles,we investigate the ground state properties for small polaron systems,the coherent-squeezed fluctuation correction,and the anomalous lattice quantum fluctuation,with the new variational generator containing correlated squeezed-coherent coupling and quantum entanglement.Noting tha t-2t is the T.B.A.energy,for the coherent interaction effect,we find the ground-state energy E_(0)to be-2.428t,in which the coherent squeezed fluctuation correction-A_(0)t is-0.463t(where t is the hopping integral,ωis the phonon frequency),with the electron–one-phonon coupling constant g=1 and the electron–two-phonon coupling constant g_(1)=-0.1.However,as a result of the action–counteraction effect,E_(0)is-2.788t,but-E_(0)t is-0.735t.As to the polaron binding energy(EP),for the coherent interaction effect,E_(P) is-1.38ω,but for the action–counteraction effect,E_(P) is-1.88ω.In particular,the electron–two-phonon interaction noticeably enlarges the coherent interaction and the coherent squeezed quantum fluctuation correction.By intervening with the quantum entanglement,the evolutions of the squeezed coherent state and the lattice quantum fluctuation begin to take control.At that time,we encounter a new quantum phase coherence phenomenon—the collapse and revival of inversion repeatedly for the coherent state in the entangled evolution.展开更多
Many organizations have datasets which contain a high volume of personal data on individuals,e.g.,health data.Even without a name or address,persons can be identified based on the details(variables)on the dataset.This...Many organizations have datasets which contain a high volume of personal data on individuals,e.g.,health data.Even without a name or address,persons can be identified based on the details(variables)on the dataset.This is an important issue for big data holders such as public sector organizations(e.g.,Public Health Organizations)and social media companies.This paper looks at how individuals can be identified from big data using a mathematical approach and how to apply this mathematical solution to prevent accidental disclosure of a person’s details.The mathematical concept is known as the“Identity Correlation Approach”(ICA)and demonstrates how an individual can be identified without a name or address using a unique set of characteristics(variables).Secondly,having identified the individual person,it shows how a solution can be put in place to prevent accidental disclosure of the personal details.Thirdly,how to store data such that accidental leaks of the datasets do not lead to the disclosure of the personal details to unauthorized users.展开更多
Genotype is generally determined by the co-expression of diverse genes and multiple regulatory pathways in plants. Gene co-expression analysis combining with physiological trait data provides very important informatio...Genotype is generally determined by the co-expression of diverse genes and multiple regulatory pathways in plants. Gene co-expression analysis combining with physiological trait data provides very important information about the gene function and regulatory mechanism. L-Ascorbic acid (AsA), which is an essential nutrient component for human health and plant metabolism, plays key roles in diverse biological processes such as cell cycle, cell expansion, stress resistance, hormone synthesis, and signaling. Here, we applied a weighted gene correlation network analysis approach based on gene expression values and AsA content data in ripening tomato (Solanum lycopersicum L.) fruit with different AsA content levels, which leads to identification of AsA relevant modules and vital genes in AsA regulatory pathways. Twenty- four modules were compartmentalized according to gene expression profiling. Among these modules, one negatively related module containing genes involved in redox processes and one positively related module enriched with genes involved in AsA biosynthetic and recycling pathways were further analyzed. The present work herein indicates that redox pathways as well as hormone-signal pathways are closely correlated with AsA accumulation in ripening tomato fruit, and allowed us to prioritize candidate genes for follow-up studies to dissect this interplay at the biochemical and molecular level.展开更多
文摘Privacy protection for big data linking is discussed here in relation to the Central Statistics Office (CSO), Ireland's, big data linking project titled the 'Structure of Earnings Survey - Administrative Data Project' (SESADP). The result of the project was the creation of datasets and statistical outputs for the years 2011 to 2014 to meet Eurostat's annual earnings statistics requirements and the Structure of Earnings Survey (SES) Regulation. Record linking across the Census and various public sector datasets enabled the necessary information to be acquired to meet the Eurostat earnings requirements. However, the risk of statistical disclosure (i.e. identifying an individual on the dataset) is high unless privacy and confidentiality safe-guards are built into the data matching process. This paper looks at the three methods of linking records on big datasets employed on the SESADP, and how to anonymise the data to protect the identity of the individuals, where potentially disclosive variables exist.
文摘This paper describes how data records can be matched across large datasets using a technique called the Identity Correlation Approach (ICA). The ICA technique is then compared with a string matching exercise. Both the string matching exercise and the ICA technique were employed for a big data project carried out by the CSO. The project was called the SESADP (Structure of Earnings Survey Administrative Data Project) and involved linking the Irish Census dataset 2011 to a large Public Sector Dataset. The ICA technique provides a mathematical tool to link the datasets and the matching rate for an exact match can be calculated before the matching process begins. Based on the number of variables and the size of the population, the matching rate is calculated in the ICA approach from the MRUI (Matching Rate for Unique Identifier) formula, and false positives are eliminated. No string matching is used in the ICA, therefore names are not required on the dataset, making the data more secure & ensuring confidentiality. The SESADP Project was highly successful using the ICA technique. A comparison of the results using a string matching exercise for the SESADP and the ICA are discussed here.
基金Project supported by the National Natural Science Foundation of China(Grant No.10574163)。
文摘Based on the coherent interaction and action–counteraction principles,we investigate the ground state properties for small polaron systems,the coherent-squeezed fluctuation correction,and the anomalous lattice quantum fluctuation,with the new variational generator containing correlated squeezed-coherent coupling and quantum entanglement.Noting tha t-2t is the T.B.A.energy,for the coherent interaction effect,we find the ground-state energy E_(0)to be-2.428t,in which the coherent squeezed fluctuation correction-A_(0)t is-0.463t(where t is the hopping integral,ωis the phonon frequency),with the electron–one-phonon coupling constant g=1 and the electron–two-phonon coupling constant g_(1)=-0.1.However,as a result of the action–counteraction effect,E_(0)is-2.788t,but-E_(0)t is-0.735t.As to the polaron binding energy(EP),for the coherent interaction effect,E_(P) is-1.38ω,but for the action–counteraction effect,E_(P) is-1.88ω.In particular,the electron–two-phonon interaction noticeably enlarges the coherent interaction and the coherent squeezed quantum fluctuation correction.By intervening with the quantum entanglement,the evolutions of the squeezed coherent state and the lattice quantum fluctuation begin to take control.At that time,we encounter a new quantum phase coherence phenomenon—the collapse and revival of inversion repeatedly for the coherent state in the entangled evolution.
文摘Many organizations have datasets which contain a high volume of personal data on individuals,e.g.,health data.Even without a name or address,persons can be identified based on the details(variables)on the dataset.This is an important issue for big data holders such as public sector organizations(e.g.,Public Health Organizations)and social media companies.This paper looks at how individuals can be identified from big data using a mathematical approach and how to apply this mathematical solution to prevent accidental disclosure of a person’s details.The mathematical concept is known as the“Identity Correlation Approach”(ICA)and demonstrates how an individual can be identified without a name or address using a unique set of characteristics(variables).Secondly,having identified the individual person,it shows how a solution can be put in place to prevent accidental disclosure of the personal details.Thirdly,how to store data such that accidental leaks of the datasets do not lead to the disclosure of the personal details to unauthorized users.
基金supported by the National Natural Science Foundation of China (31271959)National Basic Research Program (2011CB100604) of China
文摘Genotype is generally determined by the co-expression of diverse genes and multiple regulatory pathways in plants. Gene co-expression analysis combining with physiological trait data provides very important information about the gene function and regulatory mechanism. L-Ascorbic acid (AsA), which is an essential nutrient component for human health and plant metabolism, plays key roles in diverse biological processes such as cell cycle, cell expansion, stress resistance, hormone synthesis, and signaling. Here, we applied a weighted gene correlation network analysis approach based on gene expression values and AsA content data in ripening tomato (Solanum lycopersicum L.) fruit with different AsA content levels, which leads to identification of AsA relevant modules and vital genes in AsA regulatory pathways. Twenty- four modules were compartmentalized according to gene expression profiling. Among these modules, one negatively related module containing genes involved in redox processes and one positively related module enriched with genes involved in AsA biosynthetic and recycling pathways were further analyzed. The present work herein indicates that redox pathways as well as hormone-signal pathways are closely correlated with AsA accumulation in ripening tomato fruit, and allowed us to prioritize candidate genes for follow-up studies to dissect this interplay at the biochemical and molecular level.