The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an eff...The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.展开更多
Online Social Networks (OSN) sites allow end-users to share agreat deal of information, which may also contain sensitive information,that may be subject to commercial or non-commercial privacy attacks. Asa result, gua...Online Social Networks (OSN) sites allow end-users to share agreat deal of information, which may also contain sensitive information,that may be subject to commercial or non-commercial privacy attacks. Asa result, guaranteeing various levels of privacy is critical while publishingdata by OSNs. The clustering-based solutions proved an effective mechanismto achieve the privacy notions in OSNs. But fixed clustering limits theperformance and scalability. Data utility degrades with increased privacy,so balancing the privacy utility trade-off is an open research issue. Theresearch has proposed a novel privacy preservation model using the enhancedclustering mechanism to overcome this issue. The proposed model includesphases like pre-processing, enhanced clustering, and ensuring privacy preservation.The enhanced clustering algorithm is the second phase where authorsmodified the existing fixed k-means clustering using the threshold approach.The threshold value is determined based on the supplied OSN data of edges,nodes, and user attributes. Clusters are k-anonymized with multiple graphproperties by a novel one-pass algorithm. After achieving the k-anonymityof clusters, optimization was performed to achieve all privacy models, suchas k-anonymity, t-closeness, and l-diversity. The proposed privacy frameworkachieves privacy of all three network components, i.e., link, node, and userattributes, with improved utility. The authors compare the proposed techniqueto underlying methods using OSN Yelp and Facebook datasets. The proposedapproach outperformed the underlying state of art methods for Degree ofAnonymization, computational efficiency, and information loss.展开更多
Benefiting from the development of Federated Learning(FL)and distributed communication systems,large-scale intelligent applications become possible.Distributed devices not only provide adequate training data,but also ...Benefiting from the development of Federated Learning(FL)and distributed communication systems,large-scale intelligent applications become possible.Distributed devices not only provide adequate training data,but also cause privacy leakage and energy consumption.How to optimize the energy consumption in distributed communication systems,while ensuring the privacy of users and model accuracy,has become an urgent challenge.In this paper,we define the FL as a 3-layer architecture including users,agents and server.In order to find a balance among model training accuracy,privacy-preserving effect,and energy consumption,we design the training process of FL as game models.We use an extensive game tree to analyze the key elements that influence the players’decisions in the single game,and then find the incentive mechanism that meet the social norms through the repeated game.The experimental results show that the Nash equilibrium we obtained satisfies the laws of reality,and the proposed incentive mechanism can also promote users to submit high-quality data in FL.Following the multiple rounds of play,the incentive mechanism can help all players find the optimal strategies for energy,privacy,and accuracy of FL in distributed communication systems.展开更多
The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Infor...The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.展开更多
With the maturity and development of 5G field,Mobile Edge CrowdSensing(MECS),as an intelligent data collection paradigm,provides a broad prospect for various applications in IoT.However,sensing users as data uploaders...With the maturity and development of 5G field,Mobile Edge CrowdSensing(MECS),as an intelligent data collection paradigm,provides a broad prospect for various applications in IoT.However,sensing users as data uploaders lack a balance between data benefits and privacy threats,leading to conservative data uploads and low revenue or excessive uploads and privacy breaches.To solve this problem,a Dynamic Privacy Measurement and Protection(DPMP)framework is proposed based on differential privacy and reinforcement learning.Firstly,a DPM model is designed to quantify the amount of data privacy,and a calculation method for personalized privacy threshold of different users is also designed.Furthermore,a Dynamic Private sensing data Selection(DPS)algorithm is proposed to help sensing users maximize data benefits within their privacy thresholds.Finally,theoretical analysis and ample experiment results show that DPMP framework is effective and efficient to achieve a balance between data benefits and sensing user privacy protection,in particular,the proposed DPMP framework has 63%and 23%higher training efficiency and data benefits,respectively,compared to the Monte Carlo algorithm.展开更多
As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local da...As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.展开更多
The literary review presented in the following paper aims to analyze the tracking tools used in different countries during the period of the COVID-19 pandemic. Tracking apps that have been adopted in many countries to...The literary review presented in the following paper aims to analyze the tracking tools used in different countries during the period of the COVID-19 pandemic. Tracking apps that have been adopted in many countries to collect data in a homogeneous and immediate way have made up for the difficulty of collecting data and standardizing evaluation criteria. However, the regulation on the protection of personal data in the health sector and the adoption of the new General Data Protection Regulation in European countries has placed a strong limitation on their use. This has not been the case in non-European countries, where monitoring methodologies have become widespread. The textual analysis presented is based on co-occurrence and multiple correspondence analysis to show the contact tracing methods adopted in different countries in the pandemic period by relating them to the issue of privacy. It also analyzed the possibility of applying Blockchain technology in applications for tracking contagions from COVID-19 and managing health data to provide a high level of security and transparency, including through anonymization, thus increasing user trust in using the apps.展开更多
Due to the presence of a large amount of personal sensitive information in social networks,privacy preservation issues in social networks have attracted the attention of many scholars.Inspired by the self-nonself disc...Due to the presence of a large amount of personal sensitive information in social networks,privacy preservation issues in social networks have attracted the attention of many scholars.Inspired by the self-nonself discrimination paradigmin the biological immune system,the negative representation of information indicates features such as simplicity and efficiency,which is very suitable for preserving social network privacy.Therefore,we suggest a method to preserve the topology privacy and node attribute privacy of attribute social networks,called AttNetNRI.Specifically,a negative survey-based method is developed to disturb the relationship between nodes in the social network so that the topology structure can be kept private.Moreover,a negative database-based method is proposed to hide node attributes,so that the privacy of node attributes can be preserved while supporting the similarity estimation between different node attributes,which is crucial to the analysis of social networks.To evaluate the performance of the AttNetNRI,empirical studies have been conducted on various attribute social networks and compared with several state-of-the-art methods tailored to preserve the privacy of social networks.The experimental results show the superiority of the developed method in preserving the privacy of attribute social networks and demonstrate the effectiveness of the topology disturbing and attribute hiding parts.The experimental results show the superiority of the developed methods in preserving the privacy of attribute social networks and demonstrate the effectiveness of the topological interference and attribute-hiding components.展开更多
With the growth of requirements for data sharing,a novel business model of digital assets trading has emerged that allows data owners to sell their data for monetary gain.In the distributed ledger of blockchain,howeve...With the growth of requirements for data sharing,a novel business model of digital assets trading has emerged that allows data owners to sell their data for monetary gain.In the distributed ledger of blockchain,however,the privacy of stakeholder's identity and the confidentiality of data content are threatened.Therefore,we proposed a blockchainenabled privacy-preserving and access control scheme to address the above problems.First,the multi-channel mechanism is introduced to provide the privacy protection of distributed ledger inside the channel and achieve coarse-grained access control to digital assets.Then,we use multi-authority attribute-based encryption(MAABE)algorithm to build a fine-grained access control model for data trading in a single channel and describe its instantiation in detail.Security analysis shows that the scheme has IND-CPA secure and can provide privacy protection and collusion resistance.Compared with other schemes,our solution has better performance in privacy protection and access control.The evaluation results demonstrate its effectiveness and practicability.展开更多
With the increase in IoT(Internet of Things)devices comes an inherent challenge of security.In the world today,privacy is the prime concern of every individual.Preserving one’s privacy and keeping anonymity throughou...With the increase in IoT(Internet of Things)devices comes an inherent challenge of security.In the world today,privacy is the prime concern of every individual.Preserving one’s privacy and keeping anonymity throughout the system is a desired functionality that does not come without inevitable trade-offs like scalability and increased complexity and is always exceedingly difficult to manage.The challenge is keeping confidentiality and continuing to make the person innominate throughout the system.To address this,we present our proposed architecture where we manage IoT devices using blockchain technology.Our proposed architecture works on and off blockchain integrated with the closed-circuit television(CCTV)security camera fixed at the rental property.In this framework,the CCTV security camera feed is redirected towards the owner and renter based on the smart contract conditions.One entity(owner or renter)can see the CCTV security camera feed at one time.There is no third-party dependence except for the CCTV security camera deployment phase.Our contributions include the proposition of framework architecture,a novel smart contract algorithm,and the modification to the ring signatures leveraging an existing cryptographic technique.Analyses are made based on different systems’security and key management areas.In an empirical study,our proposed algorithm performed better in key generation,proof generation,and verification times.By comparing similar existing schemes,we have shown the proposed architectures’advantages.Until now,we have developed this system for a specific area in the real world.However,this system is scalable and applicable to other areas like healthcare monitoring systems,which is part of our future work.展开更多
Human mobility prediction is important for many applications.However,training an accurate mobility prediction model requires a large scale of human trajectories,where privacy issues become an important problem.The ris...Human mobility prediction is important for many applications.However,training an accurate mobility prediction model requires a large scale of human trajectories,where privacy issues become an important problem.The rising federated learning provides us with a promising solution to this problem,which enables mobile devices to collaboratively learn a shared prediction model while keeping all the training data on the device,decoupling the ability to do machine learning from the need to store the data in the cloud.However,existing federated learningbased methods either do not provide privacy guarantees or have vulnerability in terms of privacy leakage.In this paper,we combine the techniques of data perturbation and model perturbation mechanisms and propose a privacy-preserving mobility prediction algorithm,where we add noise to the transmitted model and the raw data collaboratively to protect user privacy and keep the mobility prediction performance.Extensive experimental results show that our proposed method significantly outperforms the existing stateof-the-art mobility prediction method in terms of defensive performance against practical attacks while having comparable mobility prediction performance,demonstrating its effectiveness.展开更多
Online tracking mechanisms employed by internet companies for user profiling and targeted advertising raise major privacy concerns. Despite efforts to defend against these mechanisms, they continue to evolve, renderin...Online tracking mechanisms employed by internet companies for user profiling and targeted advertising raise major privacy concerns. Despite efforts to defend against these mechanisms, they continue to evolve, rendering many existing defences ineffective. This study performs a large-scale measurement of online tracking mechanisms across a large pool of websites using the OpenWPM (Open Web Privacy Measurement) platform. It systematically evaluates the effectiveness of several ad blockers and underlying Privacy Enhancing Technologies (PET) that are primarily used to mitigate different tracking techniques. By quantifying the strengths and limitations of these tools against modern tracking methods, the findings highlight gaps in existing privacy protections. Actionable recommendations are provided to enhance user privacy defences, guide tool developers and inform policymakers on addressing invasive online tracking practices.展开更多
With the promotion of digital currency,how to effectively solve the authenticity,privacy and usability of digital currency issuance has been a key problem.Redactable signature scheme(RSS)can provide the verification o...With the promotion of digital currency,how to effectively solve the authenticity,privacy and usability of digital currency issuance has been a key problem.Redactable signature scheme(RSS)can provide the verification of the integrity and source of the generated sub-documents and solve the privacy problem in digital currency by removing blocks from the signed documents.Unfortunately,it has not realized the consolidation of signed documents,which can not solve the problem of merging two digital currencies.Now,we introduce the concept of weight based on the threshold secret sharing scheme(TSSS)and present a redactable signature scheme with merge algorithm(RSS-MA)using the quasi-commutative accumulator.Our scheme can reduce the communication overhead by utilizing the merge algorithm when transmitting multiple digital currency signatures.Furthermore,this can effectively hide the scale of users’private monetary assets and the number of transactions between users.While meeting the three properties of digital currency issuance,in order to ensure the availability of digital currency after redacting,editors shall not remove the relevant identification information block form digital currency.Finally,our security proof and the analysis of efficiency show that RSS-MA greatly improves the communication and computation efficiency when transmitting multiple signatures.展开更多
With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issue...With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.展开更多
The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users wit...The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.展开更多
The use of privacy-enhanced facial recognition has increased in response to growing concerns about data securityand privacy in the digital age. This trend is spurred by rising demand for face recognition technology in...The use of privacy-enhanced facial recognition has increased in response to growing concerns about data securityand privacy in the digital age. This trend is spurred by rising demand for face recognition technology in a varietyof industries, including access control, law enforcement, surveillance, and internet communication. However,the growing usage of face recognition technology has created serious concerns about data monitoring and userprivacy preferences, especially in context-aware systems. In response to these problems, this study provides a novelframework that integrates sophisticated approaches such as Generative Adversarial Networks (GANs), Blockchain,and distributed computing to solve privacy concerns while maintaining exact face recognition. The framework’spainstaking design and execution strive to strike a compromise between precise face recognition and protectingpersonal data integrity in an increasingly interconnected environment. Using cutting-edge tools like Dlib for faceanalysis,Ray Cluster for distributed computing, and Blockchain for decentralized identity verification, the proposedsystem provides scalable and secure facial analysis while protecting user privacy. The study’s contributions includethe creation of a sustainable and scalable solution for privacy-aware face recognition, the implementation of flexibleprivacy computing approaches based on Blockchain networks, and the demonstration of higher performanceover previous methods. Specifically, the proposed StyleGAN model has an outstanding accuracy rate of 93.84%while processing high-resolution images from the CelebA-HQ dataset, beating other evaluated models such asProgressive GAN 90.27%, CycleGAN 89.80%, and MGAN 80.80%. With improvements in accuracy, speed, andprivacy protection, the framework has great promise for practical use in a variety of fields that need face recognitiontechnology. This study paves the way for future research in privacy-enhanced face recognition systems, emphasizingthe significance of using cutting-edge technology to meet rising privacy issues in digital identity.展开更多
With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for clou...With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.展开更多
The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may...The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.展开更多
The increasing data pool in finance sectors forces machine learning(ML)to step into new complications.Banking data has significant financial implications and is confidential.Combining users data from several organizat...The increasing data pool in finance sectors forces machine learning(ML)to step into new complications.Banking data has significant financial implications and is confidential.Combining users data from several organizations for various banking services may result in various intrusions and privacy leakages.As a result,this study employs federated learning(FL)using a flower paradigm to preserve each organization’s privacy while collaborating to build a robust shared global model.However,diverse data distributions in the collaborative training process might result in inadequate model learning and a lack of privacy.To address this issue,the present paper proposes the imple-mentation of Federated Averaging(FedAvg)and Federated Proximal(FedProx)methods in the flower framework,which take advantage of the data locality while training and guaranteeing global convergence.Resultantly improves the privacy of the local models.This analysis used the credit card and Canadian Institute for Cybersecurity Intrusion Detection Evaluation(CICIDS)datasets.Precision,recall,and accuracy as performance indicators to show the efficacy of the proposed strategy using FedAvg and FedProx.The experimental findings suggest that the proposed approach helps to safely use banking data from diverse sources to enhance customer banking services by obtaining accuracy of 99.55%and 83.72%for FedAvg and 99.57%,and 84.63%for FedProx.展开更多
In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.Howev...In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.展开更多
基金Foundation of National Natural Science Foundation of China(62202118)Scientific and Technological Research Projects from Guizhou Education Department([2023]003)+1 种基金Guizhou Provincial Department of Science and Technology Hundred Levels of Innovative Talents Project(GCC[2023]018)Top Technology Talent Project from Guizhou Education Department([2022]073).
文摘The development of technologies such as big data and blockchain has brought convenience to life,but at the same time,privacy and security issues are becoming more and more prominent.The K-anonymity algorithm is an effective and low computational complexity privacy-preserving algorithm that can safeguard users’privacy by anonymizing big data.However,the algorithm currently suffers from the problem of focusing only on improving user privacy while ignoring data availability.In addition,ignoring the impact of quasi-identified attributes on sensitive attributes causes the usability of the processed data on statistical analysis to be reduced.Based on this,we propose a new K-anonymity algorithm to solve the privacy security problem in the context of big data,while guaranteeing improved data usability.Specifically,we construct a new information loss function based on the information quantity theory.Considering that different quasi-identification attributes have different impacts on sensitive attributes,we set weights for each quasi-identification attribute when designing the information loss function.In addition,to reduce information loss,we improve K-anonymity in two ways.First,we make the loss of information smaller than in the original table while guaranteeing privacy based on common artificial intelligence algorithms,i.e.,greedy algorithm and 2-means clustering algorithm.In addition,we improve the 2-means clustering algorithm by designing a mean-center method to select the initial center of mass.Meanwhile,we design the K-anonymity algorithm of this scheme based on the constructed information loss function,the improved 2-means clustering algorithm,and the greedy algorithm,which reduces the information loss.Finally,we experimentally demonstrate the effectiveness of the algorithm in improving the effect of 2-means clustering and reducing information loss.
文摘Online Social Networks (OSN) sites allow end-users to share agreat deal of information, which may also contain sensitive information,that may be subject to commercial or non-commercial privacy attacks. Asa result, guaranteeing various levels of privacy is critical while publishingdata by OSNs. The clustering-based solutions proved an effective mechanismto achieve the privacy notions in OSNs. But fixed clustering limits theperformance and scalability. Data utility degrades with increased privacy,so balancing the privacy utility trade-off is an open research issue. Theresearch has proposed a novel privacy preservation model using the enhancedclustering mechanism to overcome this issue. The proposed model includesphases like pre-processing, enhanced clustering, and ensuring privacy preservation.The enhanced clustering algorithm is the second phase where authorsmodified the existing fixed k-means clustering using the threshold approach.The threshold value is determined based on the supplied OSN data of edges,nodes, and user attributes. Clusters are k-anonymized with multiple graphproperties by a novel one-pass algorithm. After achieving the k-anonymityof clusters, optimization was performed to achieve all privacy models, suchas k-anonymity, t-closeness, and l-diversity. The proposed privacy frameworkachieves privacy of all three network components, i.e., link, node, and userattributes, with improved utility. The authors compare the proposed techniqueto underlying methods using OSN Yelp and Facebook datasets. The proposedapproach outperformed the underlying state of art methods for Degree ofAnonymization, computational efficiency, and information loss.
基金sponsored by the National Key R&D Program of China(No.2018YFB2100400)the National Natural Science Foundation of China(No.62002077,61872100)+4 种基金the Major Research Plan of the National Natural Science Foundation of China(92167203)the Guangdong Basic and Applied Basic Research Foundation(No.2020A1515110385)the China Postdoctoral Science Foundation(No.2022M710860)the Zhejiang Lab(No.2020NF0AB01)Guangzhou Science and Technology Plan Project(202102010440).
文摘Benefiting from the development of Federated Learning(FL)and distributed communication systems,large-scale intelligent applications become possible.Distributed devices not only provide adequate training data,but also cause privacy leakage and energy consumption.How to optimize the energy consumption in distributed communication systems,while ensuring the privacy of users and model accuracy,has become an urgent challenge.In this paper,we define the FL as a 3-layer architecture including users,agents and server.In order to find a balance among model training accuracy,privacy-preserving effect,and energy consumption,we design the training process of FL as game models.We use an extensive game tree to analyze the key elements that influence the players’decisions in the single game,and then find the incentive mechanism that meet the social norms through the repeated game.The experimental results show that the Nash equilibrium we obtained satisfies the laws of reality,and the proposed incentive mechanism can also promote users to submit high-quality data in FL.Following the multiple rounds of play,the incentive mechanism can help all players find the optimal strategies for energy,privacy,and accuracy of FL in distributed communication systems.
文摘The recent interest in the deployment of Generative AI applications that use large language models (LLMs) has brought to the forefront significant privacy concerns, notably the leakage of Personally Identifiable Information (PII) and other confidential or protected information that may have been memorized during training, specifically during a fine-tuning or customization process. We describe different black-box attacks from potential adversaries and study their impact on the amount and type of information that may be recovered from commonly used and deployed LLMs. Our research investigates the relationship between PII leakage, memorization, and factors such as model size, architecture, and the nature of attacks employed. The study utilizes two broad categories of attacks: PII leakage-focused attacks (auto-completion and extraction attacks) and memorization-focused attacks (various membership inference attacks). The findings from these investigations are quantified using an array of evaluative metrics, providing a detailed understanding of LLM vulnerabilities and the effectiveness of different attacks.
基金supported in part by the National Natural Science Foundation of China under Grant U1905211,Grant 61872088,Grant 62072109,Grant 61872090,and Grant U1804263in part by the Guangxi Key Laboratory of Trusted Software under Grant KX202042+3 种基金in part by the Science and Technology Major Support Program of Guizhou Province under Grant 20183001in part by the Science and Technology Program of Guizhou Province under Grant 20191098in part by the Project of High-level Innovative Talents of Guizhou Province under Grant 20206008in part by the Open Research Fund of Key Laboratory of Cryptography of Zhejiang Province under Grant ZCL21015.
文摘With the maturity and development of 5G field,Mobile Edge CrowdSensing(MECS),as an intelligent data collection paradigm,provides a broad prospect for various applications in IoT.However,sensing users as data uploaders lack a balance between data benefits and privacy threats,leading to conservative data uploads and low revenue or excessive uploads and privacy breaches.To solve this problem,a Dynamic Privacy Measurement and Protection(DPMP)framework is proposed based on differential privacy and reinforcement learning.Firstly,a DPM model is designed to quantify the amount of data privacy,and a calculation method for personalized privacy threshold of different users is also designed.Furthermore,a Dynamic Private sensing data Selection(DPS)algorithm is proposed to help sensing users maximize data benefits within their privacy thresholds.Finally,theoretical analysis and ample experiment results show that DPMP framework is effective and efficient to achieve a balance between data benefits and sensing user privacy protection,in particular,the proposed DPMP framework has 63%and 23%higher training efficiency and data benefits,respectively,compared to the Monte Carlo algorithm.
基金the Sichuan Provincial Science and Technology Department Project under Grant 2019YFN0104the Yibin Science and Technology Plan Project under Grant 2021GY008the Sichuan University of Science and Engineering Postgraduate Innovation Fund Project under Grant Y2022154.
文摘As a distributed machine learning method,federated learning(FL)has the advantage of naturally protecting data privacy.It keeps data locally and trains local models through local data to protect the privacy of local data.The federated learning method effectively solves the problem of artificial Smart data islands and privacy protection issues.However,existing research shows that attackersmay still steal user information by analyzing the parameters in the federated learning training process and the aggregation parameters on the server side.To solve this problem,differential privacy(DP)techniques are widely used for privacy protection in federated learning.However,adding Gaussian noise perturbations to the data degrades the model learning performance.To address these issues,this paper proposes a differential privacy federated learning scheme based on adaptive Gaussian noise(DPFL-AGN).To protect the data privacy and security of the federated learning training process,adaptive Gaussian noise is specifically added in the training process to hide the real parameters uploaded by the client.In addition,this paper proposes an adaptive noise reduction method.With the convergence of the model,the Gaussian noise in the later stage of the federated learning training process is reduced adaptively.This paper conducts a series of simulation experiments on realMNIST and CIFAR-10 datasets,and the results show that the DPFL-AGN algorithmperforms better compared to the other algorithms.
文摘The literary review presented in the following paper aims to analyze the tracking tools used in different countries during the period of the COVID-19 pandemic. Tracking apps that have been adopted in many countries to collect data in a homogeneous and immediate way have made up for the difficulty of collecting data and standardizing evaluation criteria. However, the regulation on the protection of personal data in the health sector and the adoption of the new General Data Protection Regulation in European countries has placed a strong limitation on their use. This has not been the case in non-European countries, where monitoring methodologies have become widespread. The textual analysis presented is based on co-occurrence and multiple correspondence analysis to show the contact tracing methods adopted in different countries in the pandemic period by relating them to the issue of privacy. It also analyzed the possibility of applying Blockchain technology in applications for tracking contagions from COVID-19 and managing health data to provide a high level of security and transparency, including through anonymization, thus increasing user trust in using the apps.
基金supported by the National Natural Science Foundation of China(Nos.62006001,62372001)the Natural Science Foundation of Chongqing City(Grant No.CSTC2021JCYJ-MSXMX0002).
文摘Due to the presence of a large amount of personal sensitive information in social networks,privacy preservation issues in social networks have attracted the attention of many scholars.Inspired by the self-nonself discrimination paradigmin the biological immune system,the negative representation of information indicates features such as simplicity and efficiency,which is very suitable for preserving social network privacy.Therefore,we suggest a method to preserve the topology privacy and node attribute privacy of attribute social networks,called AttNetNRI.Specifically,a negative survey-based method is developed to disturb the relationship between nodes in the social network so that the topology structure can be kept private.Moreover,a negative database-based method is proposed to hide node attributes,so that the privacy of node attributes can be preserved while supporting the similarity estimation between different node attributes,which is crucial to the analysis of social networks.To evaluate the performance of the AttNetNRI,empirical studies have been conducted on various attribute social networks and compared with several state-of-the-art methods tailored to preserve the privacy of social networks.The experimental results show the superiority of the developed method in preserving the privacy of attribute social networks and demonstrate the effectiveness of the topology disturbing and attribute hiding parts.The experimental results show the superiority of the developed methods in preserving the privacy of attribute social networks and demonstrate the effectiveness of the topological interference and attribute-hiding components.
基金supported by National Key Research and Development Plan in China(Grant No.2020YFB1005500)Beijing Natural Science Foundation(Grant No.M21034)BUPT Excellent Ph.D Students Foundation(Grant No.CX2023218)。
文摘With the growth of requirements for data sharing,a novel business model of digital assets trading has emerged that allows data owners to sell their data for monetary gain.In the distributed ledger of blockchain,however,the privacy of stakeholder's identity and the confidentiality of data content are threatened.Therefore,we proposed a blockchainenabled privacy-preserving and access control scheme to address the above problems.First,the multi-channel mechanism is introduced to provide the privacy protection of distributed ledger inside the channel and achieve coarse-grained access control to digital assets.Then,we use multi-authority attribute-based encryption(MAABE)algorithm to build a fine-grained access control model for data trading in a single channel and describe its instantiation in detail.Security analysis shows that the scheme has IND-CPA secure and can provide privacy protection and collusion resistance.Compared with other schemes,our solution has better performance in privacy protection and access control.The evaluation results demonstrate its effectiveness and practicability.
基金This work was supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)under the Artificial Intelligence Convergence Innovation Human Resources Development(IITP-2023-RS-2023-00255968)Grantthe ITRC(Information Technology Research Center)Support Program(IITP-2021-0-02051)funded by theKorea government(MSIT).
文摘With the increase in IoT(Internet of Things)devices comes an inherent challenge of security.In the world today,privacy is the prime concern of every individual.Preserving one’s privacy and keeping anonymity throughout the system is a desired functionality that does not come without inevitable trade-offs like scalability and increased complexity and is always exceedingly difficult to manage.The challenge is keeping confidentiality and continuing to make the person innominate throughout the system.To address this,we present our proposed architecture where we manage IoT devices using blockchain technology.Our proposed architecture works on and off blockchain integrated with the closed-circuit television(CCTV)security camera fixed at the rental property.In this framework,the CCTV security camera feed is redirected towards the owner and renter based on the smart contract conditions.One entity(owner or renter)can see the CCTV security camera feed at one time.There is no third-party dependence except for the CCTV security camera deployment phase.Our contributions include the proposition of framework architecture,a novel smart contract algorithm,and the modification to the ring signatures leveraging an existing cryptographic technique.Analyses are made based on different systems’security and key management areas.In an empirical study,our proposed algorithm performed better in key generation,proof generation,and verification times.By comparing similar existing schemes,we have shown the proposed architectures’advantages.Until now,we have developed this system for a specific area in the real world.However,this system is scalable and applicable to other areas like healthcare monitoring systems,which is part of our future work.
基金supported in part by the National Key Research and Development Program of China under 2020AAA0106000the National Natural Science Foundation of China under U20B2060 and U21B2036supported by a grant from the Guoqiang Institute, Tsinghua University under 2021GQG1005
文摘Human mobility prediction is important for many applications.However,training an accurate mobility prediction model requires a large scale of human trajectories,where privacy issues become an important problem.The rising federated learning provides us with a promising solution to this problem,which enables mobile devices to collaboratively learn a shared prediction model while keeping all the training data on the device,decoupling the ability to do machine learning from the need to store the data in the cloud.However,existing federated learningbased methods either do not provide privacy guarantees or have vulnerability in terms of privacy leakage.In this paper,we combine the techniques of data perturbation and model perturbation mechanisms and propose a privacy-preserving mobility prediction algorithm,where we add noise to the transmitted model and the raw data collaboratively to protect user privacy and keep the mobility prediction performance.Extensive experimental results show that our proposed method significantly outperforms the existing stateof-the-art mobility prediction method in terms of defensive performance against practical attacks while having comparable mobility prediction performance,demonstrating its effectiveness.
文摘Online tracking mechanisms employed by internet companies for user profiling and targeted advertising raise major privacy concerns. Despite efforts to defend against these mechanisms, they continue to evolve, rendering many existing defences ineffective. This study performs a large-scale measurement of online tracking mechanisms across a large pool of websites using the OpenWPM (Open Web Privacy Measurement) platform. It systematically evaluates the effectiveness of several ad blockers and underlying Privacy Enhancing Technologies (PET) that are primarily used to mitigate different tracking techniques. By quantifying the strengths and limitations of these tools against modern tracking methods, the findings highlight gaps in existing privacy protections. Actionable recommendations are provided to enhance user privacy defences, guide tool developers and inform policymakers on addressing invasive online tracking practices.
基金supported by Support Plan of Scientific and Technological Innovation Team in Universities of Henan Province(20IRTSTHN013)Shaanxi Key Laboratory of Information Communication Network and Security,Xi’an University of Posts&Telecommunications,Xi’an,Shaanxi 710121,China(ICNS202006)The National Natural Science Fund(No.61802117).
文摘With the promotion of digital currency,how to effectively solve the authenticity,privacy and usability of digital currency issuance has been a key problem.Redactable signature scheme(RSS)can provide the verification of the integrity and source of the generated sub-documents and solve the privacy problem in digital currency by removing blocks from the signed documents.Unfortunately,it has not realized the consolidation of signed documents,which can not solve the problem of merging two digital currencies.Now,we introduce the concept of weight based on the threshold secret sharing scheme(TSSS)and present a redactable signature scheme with merge algorithm(RSS-MA)using the quasi-commutative accumulator.Our scheme can reduce the communication overhead by utilizing the merge algorithm when transmitting multiple digital currency signatures.Furthermore,this can effectively hide the scale of users’private monetary assets and the number of transactions between users.While meeting the three properties of digital currency issuance,in order to ensure the availability of digital currency after redacting,editors shall not remove the relevant identification information block form digital currency.Finally,our security proof and the analysis of efficiency show that RSS-MA greatly improves the communication and computation efficiency when transmitting multiple signatures.
文摘With its untameable and traceable properties,blockchain technology has been widely used in the field of data sharing.How to preserve individual privacy while enabling efficient data queries is one of the primary issues with secure data sharing.In this paper,we study verifiable keyword frequency(KF)queries with local differential privacy in blockchain.Both the numerical and the keyword attributes are present in data objects;the latter are sensitive and require privacy protection.However,prior studies in blockchain have the problem of trilemma in privacy protection and are unable to handle KF queries.We propose an efficient framework that protects data owners’privacy on keyword attributes while enabling quick and verifiable query processing for KF queries.The framework computes an estimate of a keyword’s frequency and is efficient in query time and verification object(VO)size.A utility-optimized local differential privacy technique is used for privacy protection.The data owner adds noise locally into data based on local differential privacy so that the attacker cannot infer the owner of the keywords while keeping the difference in the probability distribution of the KF within the privacy budget.We propose the VB-cm tree as the authenticated data structure(ADS).The VB-cm tree combines the Verkle tree and the Count-Min sketch(CM-sketch)to lower the VO size and query time.The VB-cm tree uses the vector commitment to verify the query results.The fixed-size CM-sketch,which summarizes the frequency of multiple keywords,is used to estimate the KF via hashing operations.We conduct an extensive evaluation of the proposed framework.The experimental results show that compared to theMerkle B+tree,the query time is reduced by 52.38%,and the VO size is reduced by more than one order of magnitude.
基金supported by the National Natural Science Foundation of China(No.62302242)the China Postdoctoral Science Foundation(No.2023M731802).
文摘The EU’s Artificial Intelligence Act(AI Act)imposes requirements for the privacy compliance of AI systems.AI systems must comply with privacy laws such as the GDPR when providing services.These laws provide users with the right to issue a Data Subject Access Request(DSAR).Responding to such requests requires database administrators to identify information related to an individual accurately.However,manual compliance poses significant challenges and is error-prone.Database administrators need to write queries through time-consuming labor.The demand for large amounts of data by AI systems has driven the development of NoSQL databases.Due to the flexible schema of NoSQL databases,identifying personal information becomes even more challenging.This paper develops an automated tool to identify personal information that can help organizations respond to DSAR.Our tool employs a combination of various technologies,including schema extraction of NoSQL databases and relationship identification from query logs.We describe the algorithm used by our tool,detailing how it discovers and extracts implicit relationships from NoSQL databases and generates relationship graphs to help developers accurately identify personal data.We evaluate our tool on three datasets,covering different database designs,achieving an F1 score of 0.77 to 1.Experimental results demonstrate that our tool successfully identifies information relevant to the data subject.Our tool reduces manual effort and simplifies GDPR compliance,showing practical application value in enhancing the privacy performance of NOSQL databases and AI systems.
文摘The use of privacy-enhanced facial recognition has increased in response to growing concerns about data securityand privacy in the digital age. This trend is spurred by rising demand for face recognition technology in a varietyof industries, including access control, law enforcement, surveillance, and internet communication. However,the growing usage of face recognition technology has created serious concerns about data monitoring and userprivacy preferences, especially in context-aware systems. In response to these problems, this study provides a novelframework that integrates sophisticated approaches such as Generative Adversarial Networks (GANs), Blockchain,and distributed computing to solve privacy concerns while maintaining exact face recognition. The framework’spainstaking design and execution strive to strike a compromise between precise face recognition and protectingpersonal data integrity in an increasingly interconnected environment. Using cutting-edge tools like Dlib for faceanalysis,Ray Cluster for distributed computing, and Blockchain for decentralized identity verification, the proposedsystem provides scalable and secure facial analysis while protecting user privacy. The study’s contributions includethe creation of a sustainable and scalable solution for privacy-aware face recognition, the implementation of flexibleprivacy computing approaches based on Blockchain networks, and the demonstration of higher performanceover previous methods. Specifically, the proposed StyleGAN model has an outstanding accuracy rate of 93.84%while processing high-resolution images from the CelebA-HQ dataset, beating other evaluated models such asProgressive GAN 90.27%, CycleGAN 89.80%, and MGAN 80.80%. With improvements in accuracy, speed, andprivacy protection, the framework has great promise for practical use in a variety of fields that need face recognitiontechnology. This study paves the way for future research in privacy-enhanced face recognition systems, emphasizingthe significance of using cutting-edge technology to meet rising privacy issues in digital identity.
基金sponsored by the National Natural Science Foundation of China under grant number No. 62172353, No. 62302114, No. U20B2046 and No. 62172115Innovation Fund Program of the Engineering Research Center for Integration and Application of Digital Learning Technology of Ministry of Education No.1331007 and No. 1311022+1 种基金Natural Science Foundation of the Jiangsu Higher Education Institutions Grant No. 17KJB520044Six Talent Peaks Project in Jiangsu Province No.XYDXX-108
文摘With the rapid development of information technology,IoT devices play a huge role in physiological health data detection.The exponential growth of medical data requires us to reasonably allocate storage space for cloud servers and edge nodes.The storage capacity of edge nodes close to users is limited.We should store hotspot data in edge nodes as much as possible,so as to ensure response timeliness and access hit rate;However,the current scheme cannot guarantee that every sub-message in a complete data stored by the edge node meets the requirements of hot data;How to complete the detection and deletion of redundant data in edge nodes under the premise of protecting user privacy and data dynamic integrity has become a challenging problem.Our paper proposes a redundant data detection method that meets the privacy protection requirements.By scanning the cipher text,it is determined whether each sub-message of the data in the edge node meets the requirements of the hot data.It has the same effect as zero-knowledge proof,and it will not reveal the privacy of users.In addition,for redundant sub-data that does not meet the requirements of hot data,our paper proposes a redundant data deletion scheme that meets the dynamic integrity of the data.We use Content Extraction Signature(CES)to generate the remaining hot data signature after the redundant data is deleted.The feasibility of the scheme is proved through safety analysis and efficiency analysis.
基金Doctoral research start-up fund of Guangxi Normal UniversityGuangzhou Research Institute of Communication University of China Common Construction Project,Sunflower-the Aging Intelligent CommunityGuangxi project of improving Middle-aged/Young teachers'ability,Grant/Award Number:2020KY020323。
文摘The overgeneralisation may happen because most studies on data publishing for multiple sensitive attributes(SAs)have not considered the personalised privacy requirement.Furthermore,sensitive information disclosure may also be caused by these personalised requirements.To address the matter,this article develops a personalised data publishing method for multiple SAs.According to the requirements of individuals,the new method partitions SAs values into two categories:private values and public values,and breaks the association between them for privacy guarantees.For the private values,this paper takes the process of anonymisation,while the public values are released without this process.An algorithm is designed to achieve the privacy mode,where the selectivity is determined by the sensitive value frequency and undesirable objects.The experimental results show that the proposed method can provide more information utility when compared with previous methods.The theoretic analyses and experiments also indicate that the privacy can be guaranteed even though the public values are known to an adversary.The overgeneralisation and privacy breach caused by the personalised requirement can be avoided by the new method.
文摘The increasing data pool in finance sectors forces machine learning(ML)to step into new complications.Banking data has significant financial implications and is confidential.Combining users data from several organizations for various banking services may result in various intrusions and privacy leakages.As a result,this study employs federated learning(FL)using a flower paradigm to preserve each organization’s privacy while collaborating to build a robust shared global model.However,diverse data distributions in the collaborative training process might result in inadequate model learning and a lack of privacy.To address this issue,the present paper proposes the imple-mentation of Federated Averaging(FedAvg)and Federated Proximal(FedProx)methods in the flower framework,which take advantage of the data locality while training and guaranteeing global convergence.Resultantly improves the privacy of the local models.This analysis used the credit card and Canadian Institute for Cybersecurity Intrusion Detection Evaluation(CICIDS)datasets.Precision,recall,and accuracy as performance indicators to show the efficacy of the proposed strategy using FedAvg and FedProx.The experimental findings suggest that the proposed approach helps to safely use banking data from diverse sources to enhance customer banking services by obtaining accuracy of 99.55%and 83.72%for FedAvg and 99.57%,and 84.63%for FedProx.
基金supported by a grant fromthe National Key R&DProgram of China.
文摘In recent years,the research field of data collection under local differential privacy(LDP)has expanded its focus fromelementary data types to includemore complex structural data,such as set-value and graph data.However,our comprehensive review of existing literature reveals that there needs to be more studies that engage with key-value data collection.Such studies would simultaneously collect the frequencies of keys and the mean of values associated with each key.Additionally,the allocation of the privacy budget between the frequencies of keys and the means of values for each key does not yield an optimal utility tradeoff.Recognizing the importance of obtaining accurate key frequencies and mean estimations for key-value data collection,this paper presents a novel framework:the Key-Strategy Framework forKey-ValueDataCollection under LDP.Initially,theKey-StrategyUnary Encoding(KS-UE)strategy is proposed within non-interactive frameworks for the purpose of privacy budget allocation to achieve precise key frequencies;subsequently,the Key-Strategy Generalized Randomized Response(KS-GRR)strategy is introduced for interactive frameworks to enhance the efficiency of collecting frequent keys through group-anditeration methods.Both strategies are adapted for scenarios in which users possess either a single or multiple key-value pairs.Theoretically,we demonstrate that the variance of KS-UE is lower than that of existing methods.These claims are substantiated through extensive experimental evaluation on real-world datasets,confirming the effectiveness and efficiency of the KS-UE and KS-GRR strategies.