摘要
微博用户利用标签信息表征其兴趣及属性,通过分析微博用户标签特点以及现有微博推荐方法的局限性,提出一种改进的基于多标签语义关联关系的微博兴趣建模方法。为了解决现有加标方法忽略了语义关联及多标签间关联的问题,首先通过计算标签对在微博用户集合中的共现频率得到标签对语义内联关系;其次构建由标签对连接词组成的路径,通过共享熵进一步计算标签对语义外联关系;最后将两者结合得到标签对语义关联关系矩阵,由此来对用户-标签矩阵进行更新,得到基于多标签语义关联关系的微博用户兴趣模型。以新浪微博公开API抓取的大量微博信息作为实验数据,进行了一系列的实验和分析,结果表明本文构建的用户兴趣模型具有较好的性能。
Tags are always utilized to represent the interest and property of microblog users. We pro pose an improved microblog user interest modeling method based on multi tag semantic correlation via analyzing the tag characteristics of microblog users and the limitations of existing microblog recommen dation methods. Firstly, the co occurrence frequency of tag pairs in the micro blog user set is calculated to obtain the inner correlation between tag pairs. Secondly, the path is constructed based on the link tags for each tag pair and the outer correlation of tag pairs is obtained via the shared entropy. Finally, we combine the above two correlations to acquire the semantic correlation relation matrix, based on which the user tag matrix can be updated, thus the microblog user interest model based on multi tag se mantic correlation can be constructed. We evaluate our method through a series of experiments based on a dataset crawled from the open API of Sina Weibo and the results are analyzed. The results show that our method outperforms traditional user interest discovering methods.
作者
王艳茹
马慧芳
刘海姣
魏家辉
WANG Yan-ru;MA Hui-fang;LIU Hai-jiao;WEI Jia-hui(College of Computer Science and Engineering,Northwest Normal University,Lanzhou 730070;Guangxi Key Laboratory of Trusted Software,Guilin University of Electronic Technology,Guilin 541004,China)
出处
《计算机工程与科学》
CSCD
北大核心
2018年第11期2067-2073,共7页
Computer Engineering & Science
基金
国家自然科学基金(61363058
61762078)
广西可信软件重点实验室研究课题(kx201705)