摘要
由于用户通常注册使用多个网络应用,因而在互联网中存在着许多重复的用户身份.对重复用户身份的判定与整合在商业领域和网络安全领域都有十分重要的意义.由于用户出于隐私安全的考虑,其在互联网中提供的个人信息通常不完整或者部分内容是虚假的.考虑到用户名能够反映出用户的个性或习惯,较易获得且不涉及隐私问题,因此文中提出了一种仅依靠用户名特征进行用户身份同一性判定的方法.首先文中对用户身份同一性判定问题进行了形式化描述,进而将用户名特征分为直观特征和对比特征两类,并对用户名特征的概率分布进行了量化分析.在此基础上,文中提出了一种身份同一性判定方法,对指定用户名对是否属于同一用户进行鉴别.进而提出了一种给定单个用户名时,在用户名候选集合中检索可能属于该用户的其他用户名的方法.最后在大规模真实数据集上进行了一系列实验,证实了提出的方法的有效性.
Users usually register more than one account across multiple websites,thus there are many overlapping user identities in Internet.User identification across multiple sites is very useful both in commercial field and network security field.However,Internet users may provide incorrect personal information on the grounds of privacy protection which makes the task more difficult to tackle.Considering the usernames which are more easily to collect could reflect the owners' characteristics and habits,we propose a methodology of linking user identities across multiple websites relying only on usernames.After formulating the problem,we present the framework of our solution.We mainly focus on analyzing username features which are classified into two categories:surface features and comparison features.Then we propose a method to calculate the identification score which indicates that whether two usernames refer to the same owner.Moreover,when given a single username,we also propose a solution to find its owner's other potential usernames in candidate username set.Finally,we evaluate the effectiveness of our methodology through a series of empirical studies based on massive real world username datasets.
出处
《计算机学报》
EI
CSCD
北大核心
2015年第10期2028-2040,共13页
Chinese Journal of Computers
基金
国家"八六三"高技术研究发展计划项目基金(2012AA01A401
2012AA01A402)
国家"九七三"重点基础研究发展规划项目基金(2013CB329601
2013CB329602)
国家科技支撑计划(2012BAH38B04
2012BAH38B06)资助~~
关键词
用户名特征
身份同一性判定
多网络应用
社交网络
社会计算
username feature
linking user identities
multiple websites
social networks
social computing