摘要
搜索日志分析是数据挖掘和机器学习的重要研究内容,网络搜索日志中的隐私数据安全成为当前面临的重大挑战,提出一种分类匿名化技术与差分隐私相结合的搜索日志匿名化发布方法.首先,将k-匿名的思想与分类匿名化技术扩展到聚类方法中,分类概化准标识属性引导形成簇,所提出的查询项相似度计算方法有效改善聚类精度;其次,在簇中分别添加指数噪音数据,且使发布的数据满足差分隐私保护;最后发布处理后的数据.实验表明:该方法有效地防止搜索日志中敏感信息泄露,并提高了数据的实用性.
The search logs analysis is the important research area of data mining and machine learning,the data privacy preserving of network search logs has been a big challenge at home and abroad,this paper proposed a search log anonymous publish method based on classification anonymous technique and differential privacy.First we combine the k-anonymity and classification anonymous into cluster method,classifying the quasi-identifier attribute to cluster.In order to improve the data accuracy,we introduce the search similarity calculate method;Then we add exponent noise to the cluster and make sure it satisfies the differential privacy protection;Finally we release the protection result data set.The experiments shows that it can prevent the loss of sensitive information,protecting the network search log's privacy data and improving the data availability.
出处
《信息安全研究》
2016年第3期251-257,共7页
Journal of Information Security Research
基金
北京市社会科学基金项目(15JGB099)
北京市优秀人才培养资助项目(2013E005007000001)
国家自然科学基金项目(61370139)
关键词
差分隐私
隐私保护
网络搜索日志
数据发布
分类技术
differential privacy
privacy preserving
search log
data publish
classification technique