摘要
社交网络对用户拓扑结构进行了获取限制,使得利用结构特征进行识别的方法准确率大大下降。利用邻近度与内容特征的用户识别方法构建了一个融合属性特征、结构特征与内容特征的基于XGboost的半监督网络模型,将跨社交网络用户识别问题转换为二分类问题。针对无法获得完整用户拓扑结构与种子用户不足的问题,提出显式好友与隐式好友的提取方法,根据待匹配用户对好友网络中的显式匹配用户对、隐式匹配用户对与其他好友将好友网络融合,结合用户重要度改进LINE算法二阶邻近度的经验概率,获取待匹配用户对的结构特征;将用户发文时间序列特征、生成内容关键词重叠度特征与关注用户标签特征作为生成内容特征;最后将属性特征、结构特征与内容特征进行融合完成用户识别。在真实数据集上的实验证明了本方法的有效性。
Social networks restrict access to user topology,which greatly reduces the accuracy of identification methods using structure features.We present proximity and content based User Identification based on XGboost,a semi-supervised network model that integrates attribute,structural and content features to transform the cross-social network user identification problem into a binary classification task.To tackle the challenge of incomplete topology information and insufficient seed users,a method for extracting explicit and implicit friends is proposed.Friend networks are fused according to explicit friends,implicit friends and other friends in the friend network of the user pair to be matched.The user’s importance is combined,so as to improve empirical probability of second order proximity of LINE algorithm and obtain the structure feature.We then extract time sequence features,keyword overlapping features,and followee tag feature as the content features.Finally,these features are fused to complete user identification. Experiments on real datasets showthe effectiveness of this method.
作者
卢菁
尤晨璐
盖祺凯
刘丛
LU Jing;YOU Chenlu;GAI Qikai;LIU Cong(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)
出处
《应用科学学报》
CAS
CSCD
北大核心
2024年第6期1064-1077,共14页
Journal of Applied Sciences
基金
上海理工大学自然科学基金培育项目(No.20ZRPY08)资助。