期刊文献+

倒排索引查询处理技术 被引量:5

Query processing strategies based on inverted indexes
下载PDF
导出
摘要 介绍索引系统的基本结构以及经典查询处理方式DAAT和TAAT,给出在AND和OR两种布尔查询下的查询处理算法实现细节。分析结果表明,在海量索引规模查询的情况下,DAAT索引遍历方式要优于TAAT索引遍历方式,OR查询和AND查询的性能差距进一步加大,基于TREC WT2G和GOV2的多组实验验证了分析的结论。指出下一步在海量索引规模下搜索引擎查询处理研究的方向。 A brief overview of index structure and the state-of-the-art query processing strategies were given,i.e.DAAT(document-at-a-time)and TAAT(term-at-a-time).An explicit implementation of the two strategies of AND and OR operators was presented.The analytic conclusions show that operator OR is extremely slower than operator AND and DAAT is more efficient than TAAT,especially for large indexes.The experimental results on TREC WT2 Gand GOV2datasets verified the analytic conclusions.Finally,the future study of query processing based on large scale of indexes was presented.
出处 《计算机工程与设计》 北大核心 2015年第3期572-575,580,共5页 Computer Engineering and Design
基金 国家自然科学基金项目(61170286)
关键词 搜索引擎 倒排索引 跳转指针 查询处理 布尔查询 search engine inverted index skipping pointer query processing Boolean query
  • 相关文献

参考文献15

  • 1Dean J. Challenges in building large-scale information retrieval systems: Invited talk [C] //Proeeedings of the Second ACM International Conference on Web Seareh and Data Mining. ACM, 2009: 1.
  • 2李晓明,单栋栋.基于文档重要度的静态索引剪枝方法[J].华南理工大学学报(自然科学版),2011,39(4):1-6. 被引量:1
  • 3Jonassen S, Bratsberg SE. Efficient compressed inverted index skipping for disjunctive text-queries [M]. Advances in Infor- mation Retrieval. Berlin: Springer Berlin Heidelberg, 2011: 530-542.
  • 4Dimopoulos C, Nepomnyaehiy S, Suel T. Optimizing top-k document retrieval strategies for block-max indexes [C] // Proceedings of the Sixth ACM International Conference on Web Search and Data Mining. ACM, 2013: 113-122.
  • 5Rossi C, De Moura ES, Carvalho AL, et al. Fast doeumen- tat-a-time query proeessing using two-tier indexes [C] //Pro- ceedings of the 36th Intemational ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2013: 183-192.
  • 6Tonellotto N, Macdonald C, Ounis I. Efficient dynamic pru- ning with proximity support [C] //Proceedings of the 8th Workshop on Large-Scale Distributed Systems for Information Retrieval, 2010: 31-35.
  • 7Chakrabarti K, Chaudhuri S, Ganti V. Interval-based pruning for top-k processing over compressed lists [C] //IEEE 27th Internatio- nal Conference on Data Engineering. IEEE, 2011: 709-720.
  • 8Wang L, Lin J, Metzler D. A easeade ranking model for effi- cient ranked retrieval [C] //Proeeedings of the 34th Interna- tional ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 2011: 105-114.
  • 9Lacour P, Macdonald C, Ounis I. Efficiency comparison of document matching techniques [C] //Effieieney Issues in In- formation Retrieval Workshop: European Conference for Infor-mation Retrieval, 2008: 37-46.
  • 10Fontoura M, Josifovski V, Liu J, et al. Evaluation strate- gies for top-k queries over memory-resident inverted indexes [J]. Proceedings of the VLDB Endowment, 2011, 4 (12): 1213-1224.

二级参考文献14

  • 1李晓明,闫宏飞,王继民.搜索引擎-原理、技术与系统[M].北京:科学出版社,2010:130.
  • 2Carmel D, Cohen D. Static index pruning for information retrieval systems [C] //Proceeding of the 24th Annual International ACM SIGIR Conterence on Research and Development in Information Retrieval. New York :ACM ,2001 : 43-50.
  • 3BOttcher S, Clarke C. A document-centric approach to static index pruning in text retrieval systems [ C ]//Proceedings of the 15th ACM International Conference on Information and Knowledge Management. New York : ACM, 2006 : 182-190.
  • 4Nguyen L T. Static index pruning for information retrieval system:a posting-based approach [ C ]//7th Workshop on Large-Scale Distributed Systems for Information Retrieval. New York : ACM ,2009:25-32.
  • 5De Moura E S, Dos Santos C F, Fernandes D R, et al. Im- proving web search efficiency via a locality based static pruning method [ C ] // Proceedings of the 14th International Conference on World Wide Web. New York:ACM, 2005 : 235- 244.
  • 6Altingovde I S, Ozcan R, Ulusoy O. Exploiting query views for static index pruning in web search engines [ C]//Proceeding of the 18th ACM Conference on Information and Knowledge Management. New York: ACM, 2009: 1951- 1954.
  • 7Persin M,Zobel J, Sacks-Davis R. Filtered document retrieval with frequency-sorted indexes [ J ]. Journal of the American Society for Information Science, 1996,47 : 749- 764.
  • 8Anh V N, Moffat A. Pruned query evaluation using precomputed impacts [C]//Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2006:372-379.
  • 9Zhang F, Shi S, Yan H, et at. Revisiting globally sorted indexes for efficient document retrieval [ C ] // Proceedings of the Third ACM International Conference on Web Search and Data Mining. New York: ACM,2010: 371- 380.
  • 10Skobeltsyn G, Junqueira F, Plachouras V, et al. Resin : a combination of results caching and index pruning for high-performance web search engines [ C ]//Proceedings of the 31th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York : ACM,2008 : 131-138.

同被引文献29

引证文献5

二级引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部