摘要
为增强个体与隐私信息的保护力度,提高数据效用和降低时间代价,提出半监督聚类的(α,k)匿名模型,并设计算法予以实现,分析了算法时间复杂度.针对数据集包含数值属性和分类属性的特点,把数值属性和分类属性映射到相同的度量空间进行运算,以相异矩阵表示数据集元组之间的距离,使相同或者相近的元组有效地聚集到同一个簇内.把高敏感度属性设置较高的保护度,低敏感度设置较低的保护度,实现了敏感属性的个性化保护.实验结果表明,半监督(α,k)匿名模型可安全且高效地实现隐私保护,保证了发布数据的质量.
To enhance protection efforts of individual and privacy information,as well as to improve data utility and reduce the time cost,an(α,k)-anonymous model based on semi-supervised clustering was proposed,and an algorithm was designed to achieve this model.Moreover,time complexity of the algorithm was analyzed.The data set,containing numeric and category attributes,was mapped to the same metric space.Therefore,the element of the dissimilarity matrix can denote the distance between the tuples,which results in the same or similar tuples gathering to the same cluster.For high sensitivity attributes,a higher degree of protection was set whereas for low sensitive attributes a lower degree of protection was set,achieving personalized protection of sensitive attributes.Experimental results show that the semi-supervised(α,k)-anonymous model can achieve safe and efficient privacy preservation while simultaneously ensuring the quality of publishing data.
出处
《哈尔滨工程大学学报》
EI
CAS
CSCD
北大核心
2011年第11期1489-1494,共6页
Journal of Harbin Engineering University
基金
国家自然科学基金资助项目(61073043
61073041
60873037)
黑龙江省自然科学基金资助项目(F200901)
关键词
数据发布
隐私保护
匿名数据
半监督
聚类
data publishing
privacy preserving
anonymous data
semi-supervised clustering