摘要
介绍索引系统的基本结构以及经典查询处理方式DAAT和TAAT,给出在AND和OR两种布尔查询下的查询处理算法实现细节。分析结果表明,在海量索引规模查询的情况下,DAAT索引遍历方式要优于TAAT索引遍历方式,OR查询和AND查询的性能差距进一步加大,基于TREC WT2G和GOV2的多组实验验证了分析的结论。指出下一步在海量索引规模下搜索引擎查询处理研究的方向。
A brief overview of index structure and the state-of-the-art query processing strategies were given,i.e.DAAT(document-at-a-time)and TAAT(term-at-a-time).An explicit implementation of the two strategies of AND and OR operators was presented.The analytic conclusions show that operator OR is extremely slower than operator AND and DAAT is more efficient than TAAT,especially for large indexes.The experimental results on TREC WT2 Gand GOV2datasets verified the analytic conclusions.Finally,the future study of query processing based on large scale of indexes was presented.
出处
《计算机工程与设计》
北大核心
2015年第3期572-575,580,共5页
Computer Engineering and Design
基金
国家自然科学基金项目(61170286)
关键词
搜索引擎
倒排索引
跳转指针
查询处理
布尔查询
search engine
inverted index
skipping pointer
query processing
Boolean query