摘要
提出了一种新的算法,来解决在分布式的环境中top-k求解问题(求出全局数值最大的前k名)。之前的研究,例如TA、TPUT、HT算法,都会消耗大量的带宽。KLEE算法虽然能够大大地减少带宽的消耗,却不能给出精确解。而提出的算法FT由于添加了一个预处理阶段并且使用了histogram bloom技术,即能有效地减少带宽的消耗,又能给出精确解。实现了FT和相关的算法,并进行了全面的比较。比较是建立在真实的数据集和根据不同情况合成的数据集的基础上的。实验结果显示FT在带宽消耗上面,相对于其他算法有很大的改进和优势。
A new algorithm FT to answer top-k queries(find the k objects with the highest aggregate scores) in a distributed net-work is proposed.Prior research such as Threshlod Algorithm(TA),Three-Phase Uniform-Threshold(TPUT) algorithm and Hybrid-Threshold(HT) algorithm consume an excessive amount of bandwidth.KLEE algorithm can reduce network bandwidth consumption but it can't give exact top-k answers.The algorithm FT can both reduce network bandwidth and give the exact top-k answer based on the pretreatment in the first phase and the histogram bloom.This paper implements FT and related algorithms and conducts a comprehensive performance evaluation.Evaluation employs real-world and synthetic data sets.The experiment shows that FT can achieve major performance in terms of network bandwidth.
出处
《计算机工程与应用》
CSCD
北大核心
2010年第18期89-92,共4页
Computer Engineering and Applications
基金
中科院创新基金重点支持项目~~