摘要
在基于位置的应用领域中,如自然灾害监测、流感趋势预测、定向广告推广等,用户地理位置的推测起到重要的作用。现有方法主要利用文本内容和社交网络进行位置推测,一方面未能充分挖掘和融合2种信息,另一方面推测社交网络中孤立用户的位置比较困难。因此,本文提出一种融合文本主题和社交关系图神经网络的社交网络用户住所位置推测方法(Social Relationship Graph Convolutional Network,SRGCN)。主要方法包括:首先,从文本内容中获取混合特征,利用TF-IDF获得文本特征向量,根据用户之间的提及信息建立初始社交关系图;其次,针对用户社交关系图中存在孤立用户并难以估计其位置的问题,建立主题模型,根据主题向量相似度为孤立用户建立联系,补充社交关系图;最后,基于图卷积神经网络处理社交关系图数据,对文本特征和网络结构进行联合建模,以有效推测用户的地理位置。在真实世界基准数据集GeoText上探究了主题相似度阈值对推测性能和图规模的影响,实验结果表明本文方法能够增加可定位用户的比例并将大部分属于同一类的用户节点聚集;SRGCN在平均距离误差、距离误差中位数、推测准确度方面均优于现有方法,在GeoText数据集上,Acc@161比性能最好的GCN高出1%,平均误差距离降低16 km。实验结果验证了SRGCN的有效性,该方法可以提高用户住所位置推测准确率。
Prediction of users' geolocation plays an important role in location-based applications such as natural disaster monitoring,flu trend prediction,and targeted advertising promotion.Integrating multi-source information,mining user behavior characteristics,and analyzing user social attributes can help improve prediction accuracy and reduce distance error.Existing methods primarily rely on textual content and social networks for location prediction without considering the fusion of these two types of information,and have difficulty in predicting the locations of isolated users in social networks.Therefore,this paper proposes a home location prediction method for social network users integrating text topic and social relationship graph neural network.In the method,first,hybrid features are extracted from text content,using TF-IDF to obtain text feature vectors,and an initial social relationship graph is established based on the mentioned information between users.Then,to address the issue of isolated users in the user social relationship graph and difficulty in estimating their locations,a topic model is established to establish connections for isolated users based on topic vector similarity and supplement the social relationship graph.Finally,based on graph convolutional neural network,social relationship graph data are processed,and text features and network structure are jointly modeled to effectively predict users' geolocation.The effect of topic similarity threshold on prediction performance and graph size is explored on a real-world benchmark dataset GeoText.The experimental results show that our method is able to aggregate most of the user nodes belonging to the same class and increase the proportion of locatable users.The network constructed using multiple types of relationships can maintain the diversity of user relationships and can achieve better prediction accuracy of graph neural network.SRGCN outperforms the existing methods in terms of the average distance error,the median distance error,and the prediction accuracy,which indicates that the multi-view feature learning model is superior for geolocation prediction compared to models based on a single source of information.On the GeoText dataset,the Acc@161 of SRGCN is 1% higher than that of GCN method,and the average error distance is reduced by 16km,which indicates that the SRGCN method is more competitive than the existing best-performing method.Our experimental results demonstrate the effectiveness of SRGCN,which can improve the accuracy of home location prediction of users.
作者
高嘉媛
熊伟
陈荦
欧阳雪
杨凯钧
GAO Jiayuan;XIONG Wei;CHEN Luo;OUYANG Xue;YANG Kaijun(College of Electronic Science,National University of Defense Technology,Changsha 410073,China;The Second Surveying and Mapping Institute of Hunan Province,Changsha 410119,China)
出处
《地球信息科学学报》
EI
CSCD
北大核心
2024年第2期488-498,共11页
Journal of Geo-information Science
基金
国家自然科学基金项目(U19A2058)。