摘要
针对搜索引擎中索引组织策略在查询性能和可扩展性等方面存在的问题,提出了一种混合型分布式索引组织策略(Loc-Glob).该策略整合了局部和全局索引组织的基本思路,首先将搜索引擎系统的索引服务器从逻辑上分为若干个索引服务器池,索引数据先以局部(或全局)索引组织策略分配到索引服务器池上.然后,在索引服务器池的内部,索引继续以全局(或局部)索引组织的方式存储到各索引服务器上.混合型的索引组织策略较局部和全局索引组织策略具有更好的可扩展性.实验结果表明,该策略较全局索引组织策略在查询性能、负载均衡方面都有所提升,与局部索引组织策略的查询性能基本相当,并具备较高的负载均衡水平.
A hybrid index organization strategy named Loc-Glob was proposed to enhance the query performance and scalability in search engine. Loc-Glob integrates two welt-studied index partitioning schemes, which are widely used in search engines. Firstly, index is partitioned according to local (or global) index organization strategy, taking cluster of some index servers as a single machine. Then, index distributed to certain cluster are further partitioned to index servers according to the global (or local) index organization strategy inside the cluster. Loc-Glob is more scalable than the traditional strategies to accom- modate the explosively growing web pages. Experimental results indicate that the throughput of Loc-Glob outperforms the global index organization while it is very close to the local index organization, and Loc- Glob provides good load-balancing level.
出处
《浙江大学学报(工学版)》
EI
CAS
CSCD
北大核心
2009年第8期1361-1366,共6页
Journal of Zhejiang University:Engineering Science
基金
国家"973"重点基础研究发展规划资助项目(2006CB303000)
关键词
搜索引擎
倒排索引
分布式索引组织
查询性能
负载均衡
search engine
inverted index
distributed index organization
query performance
load balancing