摘要
【目的/意义】通过社交媒体用户分享的图像、博文及用户标签进行数据挖掘,来判断和预测用户的真实兴趣,从而更好地为用户做个性化推荐和精准化服务。【方法/过程】在获取微博用户分享的图像、博文及用户标签的基础上,通过使用机器学习的方法利用图像、博文及用户标签数据来表达用户兴趣特征,基于三类特征使用SVM训练得到分类器进行用户兴趣类别预测,主要比较单模数据和多模数据的分类指标,探讨多模数据在有监督学习下的发多分类问题。【结果/结论】实验结果表明,利用图像、博文和用户标签合成的多模数据对用户兴趣进行分类识别,F值达到77%,比最好的单模数据提升10%。实验结论证实,多模(图像、博文和标签)数据与单模数据相比,提升了分类效果,同时为多模数据应用研究提供了一定的理论和技术基础。
[Purpose/significance] By analyzing and fusing micro-blog image, post information and user tag, mining and predicting users' real interests, in order to do a better and more precision personalized recommendation services. [Method/ process ] Firstly, a micro-blog user oriented spider is implemented to crawl image, text and tag data. After that, three user interest feature sets were constructed by using CNN,Word2vec and BOW, based on which we used SVM to train classifier and predict users' interest on these data respectively, we aimed at comparing the classification indicators between single mode data and multi-mode data, and exploring multi-classification problem on multi-mode data under supervised learning method. [ Result/conclusion] Our experiment results show that user interest identification integrating images, texts and tags achieved 77% on F1 score, which achieved 10% improve on F1 score compared to the ever best single mode data. The pro- posed method can effectively predict user interest, which provides a theoretical and technical basis for the application of multi-mode data.
出处
《情报科学》
CSSCI
北大核心
2018年第1期124-129,共6页
Information Science
基金
国家自然科学基金面上项目(71473183)
关键词
社交网络
数据挖掘
兴趣识别
多模数据
用户兴趣分类
social network
data mining
user interest identification
multi-mode data
user interest classification