摘要
为提高微博搜索的准确性,提出一种适应性的微博消息索引模式。将微博消息的转发和回复表示为树形结构并进行编码;提出一种基于内容和排名的索引模式,根据新消息的到来适应性地调整内存中的索引数据;为避免检索过程扫描整个微博数据集,提出一种Top-k阈值优化方法。Twitter数据实验结果表明,该模式降低了微博数据索引时的时间和空间开销,其性能随着时间的推移比较稳定。
To improve the accuracy of Microblog searching,an adaptive Microblog message indexing schema was proposed. Firstly,trees were constructed according to the forward and reply of messages,and these trees were encoded.Secondly,content and rank based indexing schema was proposed,and the index structure in memory was updated adaptively when a new message came.Finally,to avoid scanning the whole Microblog data,a Top-k threshold optimization method was proposed.Results of ex-periments on Twitter data set show that,the proposed index schema reduces the time and space cost while indexing the Microb-log messages,and its performance is stable along with time.
出处
《计算机工程与设计》
北大核心
2015年第5期1362-1367,共6页
Computer Engineering and Design
基金
公安部重大基金项目(201202ZDYJ017)
河南省教育厅科学技术研究重点基金项目(14A520011)
关键词
微博
信息检索
索引模式
阈值
社会网络
Microblog
information retrieval
indexing schema
threshold
social network