T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher...T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher parallelism as well as memory bandwidth than CPU and can be used to accelerate T-overlap query.In this paper,we use hash segmentation to divide inverted lists into segments,then design an efficient inverted index called GHSII on GPU using hash segmentation.Based on GHSII,a new segmentation parallel T-overlap algorithm,GSPS,is proposed.GSPS uses segment at a time to scan segments and uses shared memory to decrease the number of accesses to device memory.Furthermore,an optimized algorithm called GSPS-TLLO using a heuristic query order is proposed to solve the problem of load imbalance.Experiments are carried out on two real datasets and the results show that GSPS-TLLO outperforms the state-of-the-art GPU parallel T-overlap algorithms.展开更多
文摘T-overlap query is the basis of set similarity query and has been applied in many important fields.Most existing approaches employ a pruning-and-verification framework,thus in low efficiency.Modern GPU has much higher parallelism as well as memory bandwidth than CPU and can be used to accelerate T-overlap query.In this paper,we use hash segmentation to divide inverted lists into segments,then design an efficient inverted index called GHSII on GPU using hash segmentation.Based on GHSII,a new segmentation parallel T-overlap algorithm,GSPS,is proposed.GSPS uses segment at a time to scan segments and uses shared memory to decrease the number of accesses to device memory.Furthermore,an optimized algorithm called GSPS-TLLO using a heuristic query order is proposed to solve the problem of load imbalance.Experiments are carried out on two real datasets and the results show that GSPS-TLLO outperforms the state-of-the-art GPU parallel T-overlap algorithms.