摘要
针对泛化会造成数据信息损失量较大,且该缺陷会随数据维度变大而越明显的问题,提出一种基于局部划分的匿名算法。在确保k-匿名和l-多样性的前提下,基于敏感属性栏值约束和记录间距离将数据表横向分成若干个桶,然后对每个桶基于属性间的关联纵向分成多栏,最后对同一桶中各栏中的数据进行随机重排。实验结果表明,在处理高维数据时,与LGAA-CP算法相比,信息损失量减少了47%~183%,关联关系保留率提高了24%~118%。与Slicing算法相比,信息损失量相差在1.5%之内,关联关系保留率提高了8.9%~22.8%。通过分析,该算法在同时确保高维数据的隐私保护能力和数据可用性方面是有效的。
Aiming at the problem that the generalization causes a large amount of loss of data information,and this defect would become more obvious as the data dimension becomes larger.This paper proposed an anonymous algorithm based on local partitioning.To ensure k-anonymity and l-diversity,based on the value constraint of sensitive attribute column and the distance between records,it horizontally divided the data table into several buckets.And then based on the relationship between the attributes,it longitudinally divided the buckets into multiple columns.Finally,it randomly rearranged the columns in the same bucket.The experimental results show that when dealing with high dimensional data,compared with LGAA-CP algorithm,it reduces the loss of information by 47%to 183%,and increases the retention rate of the relationship by 24%to 118%.Compared with the slicing algorithm,the loss of information between the two is within 1.5%,and it increases the retention rate of the association by 8.9%to 22.8%.The analysis shows that the algorithm is effective in ensuring both high-dimensional data privacy protection and data availability.
作者
王芳
余敦辉
张万山
Wang Fang;Yu Dunhui;Zhang Wanshan(School of Computer Science & Information Engineering,Hubei University,Wuhan 430062,China;Hubei Education Information Engineering & Technology Research Center,Wuhan 430062,China)
出处
《计算机应用研究》
CSCD
北大核心
2019年第10期3048-3053,共6页
Application Research of Computers
基金
国家重点研发计划资助项目(2016YFB0800401)
国家“973”计划资助项目(2014CB340404)
国家自然科学基金资助项目(61373037,61672387)
湖北省重大专项资助项目(2018ACA133)
关键词
数据发布隐私保护
K-匿名
l-多样性
敏感属性栏值约束
privacy-preserving data publishing
k-anonymity
l-diversity
sensitive attribute column value constraint