摘要
针对主观分配属性项权重的方法忽视了各属性项在身份匹配的应用领域中具有的特殊含义与作用,导致识别准确率低的问题,提出了一种基于信息熵的跨网络用户身份识别算法(IE-MSNUIA)。首先,该算法分析不同属性项的数据类型及物理含义,相应地采用不同的相似度计算方法;然后根据各属性的信息熵值赋予权值,进而充分挖掘各属性的潜在信息;最后融合各个属性进行决策判定账号是否匹配。理论分析和实验结果表明,与机器学习算法和主观赋权算法相比,所提算法的各个性能参数值均有所提升,在不同数据集上的平均准确率可以达到97.2%,平均召回率达到94.1%,平均综合性能值达到95.6%,可以准确地识别出用户在不同社交网络中的多个账号身份。
The precision of user identification is low since the subjective weighting algorithms ignore the special meanings and effects of attributes in applications. To solve this problem, an Information Entropy based Multiple Social Networks User Identification Algorithm( IE-MSNUIA) was proposed. Firstly, the data types and physical meanings of different attributes were analyzed, then different similarity calculation methods were correspondingly adopted. Secondly, the weights of attributes were determined according to their information entropies, thus the potential information of each attribute could be fully exploited.Finally, all chosen attributes were integrated to determine whether the account pair was the matched one. Theoretical analysis and experimental results show that, compared with machine learning based algorithms and subjective weighting algorithms, the performance of the proposed algorithm is improved, on different datasets, the average precision of it is up to 97. 2%, the average recall of it is up to 94. 1%, and the average comprehensive evaluation metric of it is up to 95. 6%. The proposed algorithm can accurately identify user accounts across multiple social networks.
出处
《计算机应用》
CSCD
北大核心
2017年第8期2374-2380,共7页
journal of Computer Applications
基金
国家自然科学基金资助项目(61379151)
国家科技支撑计划项目(2014BAH30B01)~~
关键词
用户身份识别
属性相似度
信息熵
信息融合
在线社交网络
user identification
attribute similarity
information entropy
information integration
online social network