摘要
针对目前多数匿名方法对敏感数据所有敏感属性值均作同等处理,没有考虑其敏感程度和具体分布情况,易受到相似性攻击和偏斜性攻击,导致信息损失较高的问题。提出一种基于频繁集发现的不可信任环境下敏感数据自动匿名方法。上述方法先对敏感数据属性所隐含的现实意义进行语义分析,计算敏感数据属性值之间的语义相似值和语义相异值,将准标识符中的属性分划为数值型属性和分类型属性两种类型,给出相应的敏感数据概化策略,计算出敏感数据集泛化信息损失,通过频繁集发现思想定义可考虑权重的敏感数据泛化匿名表效用度量函数,计算出所有敏感数据等价组加权信息量的平均值,由此完成不可信任环境下敏感数据自动匿名。实验结果表明,所提方法能够有效降低敏感数据泄露的风险概率,同时能够很好地降低实现自动匿名概化处理带来的信息损失。
An automatic anonymity method for sensitive data in distrustful environment based on discovery of fre- quent item set is proposed. Semantic analysis is carried out for practical significance implied by attribute of sensitive data, and semantic similarity value and semantic diversity value among the values of attribute are calculated. Then, the attribute in prospective identifier is divided into two types of numeric attribute and categorical attribute, and corre- sponding generalization strategy of sensitive data is provided. Generalized loss of information of set of sensitive data is worked out and metric function of generalized anonymity table utility that can consider weight is defined with the dis- covery of frequent item set. In addition, average value of weighting information mount of equivalence group of sensi- tive data is worked out. Thus, the automatic anonymity is completed. Experimental results show that the proposed method can reduce risk probability of sensitive data leak effectively. It can also reduce loss of information caused by generalizing process of automatic anonymity.
出处
《计算机仿真》
北大核心
2017年第5期257-260,共4页
Computer Simulation
关键词
不可信任环境
敏感数据
自动匿名方法
Distrustful environment
Sensitive data
automatic anonymity method