文本表示方法对微博Hashtag推荐影响研究——以Twitter上H7N9微博为例被引量：1

The Impact of Document Representation on Hashtag Recommendation for Microblog——Use H7N9 Corpus on Twitter as Test Dataset

下载PDF

导出

摘要在总结国内外Hashtag推荐方法和短文本表示方法的基础上,文章利用基于K最近邻(KNN)的Hashtag推荐方法,将微博文本表示为向量然后计算相似度,从语料中选出与目标微博最相似的微博文本,然后抽取候选Hashtag。文章比较了向量空间模型(VSM)、潜在语义分析模型(LSA)、隐含狄利克雷分布模型(LDA)、深度学习(DL)等四种文本表示方法对基于KNN的Hashtag推荐效果的影响。以Twitter上H7N9微博为测试数据,实验结果表明深度学习的文本表示方法在基于KNN的Hashtag推荐中取得最好的效果。 According to the summary of various Hashtag recommendation technologies and short text representation methods, this paper uses a Hashtag recommendation method based on K-Nearest Neighbor. Firstly, we represent the texts of microblog into vectors, calculate similarities between user＇s text and training text. Then we extract the most similar blogs from the corpora. The results of four text representation methods named Vector space model, Latent semantic analysis, Latent Dirichlet allocation, Deep Learning for Hashtag recommendation are compared with each other. We use H7N9 Corpus on Twitter as our test dataset. Experimental results show that deep learning text representation method has achieved the best performance among all the methods.

作者邵健章成志

机构地区南京理工大学信息管理系江苏省数据工程与知识服务重点实验室(南京大学)

出处《图书与情报》 CSSCI 北大核心 2015年第3期17-25,共9页 Library & Information

基金国家社会科学基金重大项目"面向突发事件应急决策的快速响应情报体系研究"(项目编号:13&ZD174) 国家社会科学基金项目"在线社交网络中基于用户的知识组织模式研究"(项目编号:14BTQ033) 江苏省数据工程与知识服务重点实验室开放课题"在线社交网络上交叉学科用户知识结构发现及其兴趣演变研究"(项目编号:DEKS2014KT006)研究成果之一

关键词 Hashtag推荐 K最近邻文本表示深度学习 Hashtag recommendation K-Nearest neighbor text representation deep learning

分类号 G206.2 [文化科学—传播学]

引文网络
相关文献

参考文献2

1张庆国,章成志,薛德军,张君玉.适用于隐含主题抽取的K最近邻关键词自动抽取[J].情报学报,2009,28(2):163-168. 被引量：4
2邸亮,杜永萍.LDA模型在微博用户推荐中的应用[J].计算机工程,2014,40(5):1-6. 被引量：29

二级参考文献21

1李素建,王厚峰,俞士汶,辛乘胜.关键词自动标引的最大熵模型应用研究[J].计算机学报,2004,27(9):1192-1197. 被引量：93
2张庆国,薛德军,张振海,张君玉.海量数据集上基于特征组合的关键词自动抽取[J].情报学报,2006,25(5):587-593. 被引量：17
3Edmundson H P.New Methods in Automatic Abstracting Extracting[J].Journal of the Association for Computing Machinery,1969,16(2):264-285.
4Chien L F.PAT-tree-based Keyword Extraction for Chinese Information Retrieval[C].∥Proceedings of the ACM SIGIR International Conference on Information Retrieval,Philadelphia,USA:ACM Press,1997:50-59.
5Lois L E.Experiments in Automatic Indexing and Extracting[J].Information Storage and Retrieval,1970,6:313-334.
6Salton G,Wong A,Yang C S.A Vector Space Model for Automatic Indexing[J].Communications of ACM,1975,18(11):613-620.
7Turney P D.Learning to Extract Keyphrases from Text[J].NRC Technical Report ERB-1057,National Research Council,Canada.1999:1-43.
8Turney P D.Learning algorithms for keyphrase extraction[J].Information Retrieval.2000,2:303-336.
9Frank E,Paynter G W,Witten I H,et al.Domain-specific keyphrase extraction[C]∥Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence (IJCAI-99),California:Morgan Kaufmann,1999:668-673.
10Anjewierden A,Kabel S.Automatic Indexing of Documents with Ontologies[C]∥Proceedings of the 13th Belgian/Dutch Conference on Artificial Intelligence (BNAIC-01),Amsterdam,Neteherlands,2001:23-30.