摘要
【目的】基于现有的脱敏技术,改进匿名组的划分效果,得到较优的脱敏模型及算法。【方法】基于k-匿名技术,改进维度划分标准,以KD树作为存储结构,构造新算法。利用Python实现程序,比较所产生的匿名组数量、NCP百分比,验证算法的可行性与有效性。【结果】新算法能够使得脱敏后整个数据集所生成的匿名组个数达到最大。且NCP百分比低于同类算法。【局限】对于有某一属性离散程度显著的数据集,循环计算划分维度较为繁琐。【结论】新算法相比于传统算法增加了匿名组个数,相比于同类算法,信息损失较低。
[Objective] This paper aims to improve the classification results of anonymous groups and then obtain better data masking model and algorithm. [Methods] First, we modified the dimension judgment standards based on k-anonymity. Then, we used the KD tree as storage structure to construct a new algorithm. Third, we implemented the proposed algorithm with Python. Finally, we examined the feasibility and effectiveness of the new algorithm with the number of anonymous groups and the percentage of NCP. [Results] The new algorithm could maximize the number of anonymous groups generated by the whole dataset, while the percentage of NCP was lower than similar algorithms. [Limitations] For datasets with significant degree of dispersion, the dimension of the loop computation was cumbersome. [Conclusions] The proposed algorithm could improve the availability of the anonymous groups and reduce the data loss.
作者
周倩伊
王亚民
王闯
Zhou Qianyi ,Wang Yamin ,Wang Chuang(School of Economics and Management, Xidian University, Xi'an 710126, Chin)
出处
《数据分析与知识发现》
CSSCI
CSCD
北大核心
2018年第2期58-63,共6页
Data Analysis and Knowledge Discovery