摘要
如今,图数据已经被广泛地应用于现实生活与科学研究当中,有巨大的使用和研究价值.但与此同时,针对图数据的收集与发布中也存在巨大的隐私风险.如何在保护图隐私的同时,发布与收集可用图数据,是目前个人、企业、政府等面临的重大挑战.本文首先从隐私信息所包含的内容、不同的隐私泄露场景,以及敌手模型三个方面深入地剖析了图数据在使用中存在的隐私风险,然后重点从攻击和防御两个角度展开介绍.针对攻击而言,本文分析了当前可行的图数据隐私攻击与攻击量化算法及其算法原理.针对防御而言,本文总结了简单匿名、图修改、聚类,以及差分隐私四种图数据隐私防御技术;分析了集中与分布两种数据存储场景下,不同类型图数据使用的各类隐私防御算法,以及数据隐私性与可用性度量方法.最后本文综合已有的研究成果,指出了图数据上隐私保护研究当前存在的问题、面临的挑战,及未来的研究方向.
Graph,as a typical data type,can not only represent entities,but also relations and connections among entities.It has a preferable value for both use and study.Thus,the graph has been widely adopted in real-world applications and academic research,such as social networks,disease transmission networks,fraud detection et al.Though applied prevalently,the collection and publication of graphs are suffered from a strong privacy risk.Both the presence of a node or an edge and attributes on nodes and edges may be private information.The leakage of sensitive information can result in severe consequences for individuals,enterprises,and governments,which include but are not limited to life threats,reputation damages,and fall of market values.Therefore,it is imminent to study privacy-preserving methods for graph collection and publication.Directly applying the existing privacy-preserving techniques is insufficient for graph protection.First,strong data correlations put an obstacle.Adopting some of the privacypreserving techniques straightforwardly on graphs may severely destroy data utility by damaging data correlations.While the other techniques cannot provide a strong privacy guarantee as data correlations may increase the privacy risks.Second,it is hard to protect all private information at one time.Graphs often involve abundant sensitive information.Protecting all kinds of sensitive information with existing privacy-preserving techniques may bring too much perturbance to remain a high data utility.Striking a good balance on privacy and data utility for designing privacypreserving techniques on graphs is extremely challenging.Our survey makes a deep analysis of the privacy risks in the graph data collection and publication from three aspects:definition of private information,scenarios for privacy information leakage,the adversary models.Then,we conduct a comprehensive review on both privacy attacks and privacy defenses on graphs.The privacy attacks algorithms are roughly divided into types:seed-based attacks,seed-free attacks.By comparing these two types of attacks,we conclude that the seed-based attacks can achieve higher attacking accuracy by asking the adversaries equipped with strong background knowledge.On the contrary,seed-free attacks have a slightly lower attacking accuracy.Despite this,it is more practical,effective,and robust.In addition to attack algorithms,attack quantification methods are also presented in this work.For privacy defenses,we first introduce four types of privacypreserving techniques for graphs including naïve anonymization,graph modification,clustering,and differential privacy.Then,we review different defending algorithms in both centralized settings and decentralized settings.Specifically,different strategies have been proposed for four types of graphs including adjacent matrices,statistics,random graph parameters,and synthetic graphs in both types of settings.After investigating the algorithms for privacy attacking and defending,we further analyze the defensive effect of existing algorithms against different attacks.At last,challenges faced in privacy-preserving technique development that still need to be worked on are pointed out.Accordingly,we propose possible new techniques that can be adopted to graphs and introduce new scenarios where new privacy risks are emerging.In summary,though many efforts have been put in studying privacy-preserving schemes on graphs,a lot of progress still needs to be made in the future.
作者
刘宇涵
陈红
刘艺璇
赵丹
李翠平
LIU Yu-Han;CEHN Hong;LIU Yi-Xuan;ZHAO Dan;LI Cui-Ping(Key Laboratory of Data Engineering and Knowledge Engineering of Education(Renmin University),Beijing 100872;School of Information,Renmin University,Beijing 100872)
出处
《计算机学报》
EI
CAS
CSCD
北大核心
2022年第4期702-734,共33页
Chinese Journal of Computers
基金
国家重点研发计划(No.2018YFB1004401)
国家自然科学基金(No.62072460,62076245,61772537,61772536,62172424)
北京市自然科学基金(4212022)
中国人民大学科学研究基金(中央高校基本科研业务费专项资金)(21XNH179)资助.
关键词
数据发布
数据收集
图数据
隐私保护
差分隐私
data publication
data collection
graphs
privacy-preserving
differential privacy