期刊文献+

微博网站中面向主题的权威信息搜索技术研究 被引量:2

Research on Topic-Oriented Authoritative Information Retrieval Model in Micro-blog Site
下载PDF
导出
摘要 针对微博信息的稀疏性和时效性,研究了微博网站中面向主题的权威信息搜索问题。通过提取微博隐主题方法,缓解了微博文本信息数据稀疏性的问题;通过两阶段聚类算法,将微博网站中的信息按主题进行聚类,加快了微博信息搜索时间;提出了一种微博网站中面向主题权威信息的排序模型,该排序模型结合KLdivergence语言模型的伪相关反馈技术和时间因子来对微博信息进行排序,并利用第一次检索到的首页信息中转发次数较高的微博信息进行查询扩展。在新浪微博的真实数据集上的实验结果表明,提出的隐主题模型可以较好地解决微博数据稀疏性问题,并且权威信息排序模型相对于其他排序算法,在微博网站中进行信息搜索有更好的效果。 Aiming at the inherent sparsity and strong timeliness about microblog, this paper studies the retrieval problem of topic-oriented authoritative information in microblog site. Firstly, this paper presents the method extracting the implicit theme of microblog, which can effectively ease sparsity problem about microblog short text data. Furthermore, this paper uses a two-stage clustering algorithm into microblog site to classify information by topics, which can speed up searching time. Finally, this paper proposes an efficient rank model in microblog site, which combines pseudo relevance feedback technology of KL-divergence language model and time factor for rank, and uses the first-retrieved microblog information of home page with high retweeting numbers to conduct query expansion. The experimental results on real datasets from Sina microblog demonstrate that the proposed implicit theme model can considerably solve data sparseness problem, and the rank model of authoritative information has better perfor- mance in terms of real-time information search.
出处 《计算机科学与探索》 CSCD 2013年第12期1135-1145,共11页 Journal of Frontiers of Computer Science and Technology
基金 国家自然科学基金青年科学基金 北京市自然科学基金~~
关键词 微博网站 隐主题 聚类 权威信息 microblog site implicit theme clustering authoritative information
  • 相关文献

参考文献23

  • 1Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation[J] Journal of Machine Learning Research, 2003, 3: 993-1022.
  • 2Manning C D, Schfitze H. Foundations of statistical natural lan- guage processing[M]. Cambridge, USA: MIT Press, 1999: 304.
  • 3Zhai Chengxiang, Lafferty J. A study of smoothing methods for language models applied to information retrieval[J]. ACM Transactions on Information Systems, 2004, 22(2): 179-214.
  • 4Weng Jianshu, Lim E-P, Jiang Jing, et al. TwitterRank: finding topic-sensitive influential Twitterers[C]//Proceedings of the 3rd ACM International Conference on Web Search and Data Mining (WSDM ' 10). New York, NY, USA: ACM, 2010: 261-270.
  • 5Page L, Brin S, Motwani R, et al. The PageRank citation ranking: bringing order to the Web. Stanford Digital Library, 1998.
  • 6Nagmoti R, Teredesai A, De Cock M. Ranking approache for microblog search[C]//Proceedings of the 2010 Interna tional Conference on Web Intelligence and Intelligen Agent Technology (WI-IAT '10). Washington, DC, USA IEEE Computer Society, 2010: 153-157.
  • 7Pal A, Counts S. Identifying topical authorities in micro- blogs[C]//Proceedings of the 4th ACM Intemational Confer- ence on Web Search and Data Mining (WSDM '11). New York, NY, USA: ACM, 2011: 45-54.
  • 8Massoudi K, Tsagkias M, De Rijke M, et al. Incorporating query expansion and quality indicators in searching microblog posts[C]//LNCS 6611: Proceedings of the 33rd European Conference on Information Retrieval (ECIR '11). Berlin, Heidelberg: Springer-Verlag, 2011: 362-367.
  • 9Jabeur L B, Tamine L, Boughanem M. Uprising microblogs: a Bayesian network retrieval model for tweet search[C] Proceedings of the 27th Annual ACM Symposium on Applied Computing (SAC ' 12). New York, NY, USA: ACM, 2012: 943-948.
  • 10Lin Chen, Lin Chun, Li Jingxuan, et al. Generating event storylines from microblogs[C]//Proceedings of the 21st ACM International Conference on Information and Knowl- edge Management (CIKM '12). New York, NY, USA: ACM, 2012: 175-184.

同被引文献16

引证文献2

二级引证文献8

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部