摘要
大数据隐私安全正成为各界关注的热点.攻击者通过识别用户不同网站的账户,可以构建用户的完整画像,对用户隐私形成威胁.模拟评估攻击者的重识别能力是进行用户隐私保护的前提.因此,本文提出一种高相似同天同行为算法.该算法通过检测账户在不同网站是否存在多次同天发表相近或相同内容的行为,判断账户是否属于同一用户,并通过为用户属性构建一种权重计算模型,进一步提高用户重识别的准确率.经过对两个国内主流社交网站的一万多用户进行实验,本文算法表现出良好的效果.实验表明,即使不考虑用户社交关系,用户的推文与属性依然提供了足够的信息使攻击者将用户不同网站的账户相关联,从而导致更多的隐私被泄露.
Big data Privacy security is becoming the hot spot in the various social industries, because attackers can build an integrate portrait to threaten privacy of users by identifying accounts in different sites. Simulation assessment of the attacker re-identification ability is the precondition of users' privacy protection. Therefore, this paper proposes a high similarity algorithm in same day with same behaviors. The core idea of the algorithm is as follows: if a couple account issues similar or identical content on the same day, which also appears many times in different websites, then these two accounts may belong to a person with a high possibility. In addition, this paper builds a new weighting model for the users' attributes to improve the accuracy of user re-identification. After the experiment on more than ten thousand users of the two major domestic social networking site, this algorithm proves to be effective. Experimental results show that even if attacker don't consider users' social relations, the users' tweets, attributes, still provide enough information to make the attacker correlate their different accounts, which will lead to leak of more privacy.
出处
《计算机系统应用》
2017年第12期94-103,共10页
Computer Systems & Applications
基金
国家自然科学基金重点项目(61232005)
国家自然科学基金(61402456)
关键词
社交网络
用户重识别
推文
属性
相似度
social network
users re-identification
tweets
attributes
similarity