摘要
为了在高效地保护数据隐私不被泄露的同时保证数据效用,提出了一种基于权重属性熵的分类匿名方法(Weight-properties Entropy for Classification Anonymous,WECA)。该方法在数据分类挖掘的特定应用背景下,通过信息熵的概念来计算数据集中不同准标识符属性对敏感属性的分类重要程度,选取分类权重属性熵比率最高的准标识符属性对分类树进行有利的划分,同时构建了分类匿名信息损失度量,在更好地保护隐私数据的前提下确保了数据分类效用。最后,在标准数据集上的实验结果表明,该算法在保证较少的匿名损失的同时具有较高的分类精度,提高了数据可用性。
In order to efficiently protect data privacy being not leaked,which have high availability,a classification anonymous method based on weight attributes entropy(WECA)was proposed.The method builds on application-specific background of data classification mining,and calculates the classification importance of different standard identifier to sensitive attribute by the concept of information entropy in the data set,which selects the highest ratio of weight attributes entropy in classification quasi-identifier attributes to favorably divide the classification tree.The method also constructs the anonymous information loss measures of classification,which ensures the utility of classification on the premise of protecting privacy data.Finally,the experimental results on the standard data set show that the algorithm has fewer anonymous losses and higher classification accuracy,improving data availability.
出处
《计算机科学》
CSCD
北大核心
2017年第7期42-46,共5页
Computer Science
基金
国家自然科学基金项目(61303232
61540049)
贵州省基础研究重大项目(黔科合JZ字[2014]2001-21)
贵州大学研究生创新基金(院项目)
河南省高等学校重点科研项目(16A520025)
许昌学院优秀青年骨干教师资助项目资助
关键词
隐私保护
分类匿名
权重属性熵
分类精度
Privacy protection
Classification anonymous
Weight attributes entropy
Classification accuracy