期刊文献+

一种面向聚类的隐私保护数据发布方法 被引量:13

A Privacy-Preserving Data Publishing Algorithm for Clustering Application
下载PDF
导出
摘要 隐私保护微数据发布技术可以在保护敏感数据隐私的同时,维持数据的可用性.但已有的多数发布方法都局限于类别属性数据集,发布后数据可用性以维持数据聚集查询可用性和频繁项集分析、分类挖掘可用性为主.针对数据挖掘领域另一重要任务——聚类分析,以及聚类分析中常处理的数值属性数据隐藏发布问题,提出隐藏算法NeSDO,算法对数据记录关于聚类可用性的特征进行分析,引入个性数据记录和共性数据记录的定义.采用合成数据替换扰动方法,为个性数据记录定义相应的正邻域记录集和负邻域记录集.对共性数据记录用其k最近邻域数据记录的均值替换;对个性数据记录分别采用其正邻域记录集或负邻域记录集内记录的均值进行置换,实现隐藏处理.理论分析和实验结果表明,算法NeSDO能够较好地保护敏感数值不泄露,同时能够有效保持发布后数据的聚类可用性. Privacy has become a more and more serious concern in applications involving micro-data. Recently, privacy-preserving data publishing has attracted much research work. Most of the present methods focus on categorical data publishing, and the potential applications are mainly for aggregate querying, frequent pattern mining and classification. Concerning the problem of publishing numerical data for clustering analysis, definitions of individual data record and common data record are introduced by making density analysis within the neighborhood of a given record, which can describe the effect of each data record on maintaining clustering usability. Furthermore, positive neighborhood and negative neighborhood are designed for individual data record respectively. Based on the above definitions, a data obfuscating method NeSDO is proposed, which realizes privacy-preserving data publishing by substituting primitive micro-data values with synthetic statistical values of some suitable data subset. For an individual data record, average value of records in its negative neighborhood(or positive neighborhood) is adopted to substitute corresponding items of this record. For a common data record, average value of records in its k nearest neighborhood is adopted vice versa. Theoretical analysis and experimental results indicate that the algorithm NeSDO is effective and can preserve privacy of the sensitive data well meanwhile maintaining better clustering usability.
出处 《计算机研究与发展》 EI CSCD 北大核心 2010年第12期2083-2089,共7页 Journal of Computer Research and Development
基金 国家自然科学基金项目(61003057 60973023) 江苏省自然科学基金项目(BK2006095)
关键词 隐私保护数据发布 聚类 k邻域 个性数据记录 共性数据记录 privacy-preserving data publishing clustering k nearest neighborhood individual data record common data record
  • 相关文献

参考文献11

  • 1Kantarcioglu M,Jin Jiasun,Clifton C.When do data mining results violate privacy?[C]//Proc of the 10th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining.New York:ACM,2004:599-604.
  • 2Agrawal R,Srikant R.Privacy-preserving data mining[C]//Proc of the 2000 ACM SIGMOD Conf on Management of Data.New York:ACM,2000:439-450.
  • 3周水庚,李丰,陶宇飞,肖小奎.面向数据库应用的隐私保护研究综述[J].计算机学报,2009,32(5):847-861. 被引量:220
  • 4Oliveira S R M,Zaane O R.Privacy preservation when sharing data for clustering[C]//Proc of the Int Workshop on Secure Data Management in a Connected World.Berlin:Springer,2004:67-82.
  • 5Parameswaran R,Blough D M.Privacy preserving data obfuscation for inherently clustered data[J].International Journal of Information and Computer Security,2008,2(1):1744-1765.
  • 6Mukherjee S,Chen Zhiyuan,Gangopadhyay A.A privacy-preserving technique for Euclidean distance-based mining algorithms using Fourier-related transforms[J].The International Journal on Very Large Data Bases,2006,15(4):293-315.
  • 7倪巍伟,徐立臻,崇志宏,吴英杰,刘腾腾,孙志挥.基于邻域属性熵的隐私保护数据干扰方法[J].计算机研究与发展,2009,46(3):498-504. 被引量:16
  • 8Aggarwal G,Feder T,Kenthapadi K,et al.Approximation algorithms for k-anonymity[C]//Proc of ACM SIGMOD Int Conf on Management of Data.New York:ACM,2007.
  • 9Du Yang,Xia Tian,Tao Yufei,et al.On multidimensional k-anonymity with local recoding generalization[C]//Proc of IEEE the 23rd Int Conf on Data Engineering.Los Alamitos,CA:IEEE Computer Society,2007.
  • 10Rijsbergen C J van.Information Retrieval (2nd edition)[M].London:Butterworths,1979.

二级参考文献84

共引文献228

同被引文献119

引证文献13

二级引证文献150

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部