摘要
针对NRA算法和BPA算法查询效率不高、重复访问数据的问题,提出了一种基于三维分档布鲁姆过滤器表(TF)的Top-k查询算法(TKBFP).该算法使用TF对数据进行处理,以较低的误判率获得较高的访问效率并降低了内存消耗,利用最优位置索引策略,避免重复访问数据对象.对TKBFP算法进行了严格的语义定义,推导出每一维BF中需要的哈希函数个数;以自主开发的Java程序为仿真平台,对TKBFP算法进行了试验,并对算法执行效率和存储性能进行评价.试验结果表明,该算法能够避免重复访问数据对象,并能以较低的误判率实现大规模数据的高效查询处理.与NRA和BPA相比,当属性列表超过4个时,开销明显降低,适合面向大规模数据的查询处理应用.
To solve the problems of low query efficiency and multiple access to objects in NRA algorithm and BPA algorithms, an efficient top-κ query algorithm (TKBFP) was proposed based on three-dimen-sional bracket bloom filter (TF). TF was used for data processing in the algorithm to realize low false positive rate and access efficiency with reduced memory consumption. The optimal location index strategy was adopted to avoid multiple access to data objects. A strict semantic definition of TKBFP algorithm was given to deduce the number of hash function in each BF. The TKBFP algorithm was tested by the simula-tion platform of independent development java program to evaluate the efficiency and the storage perfor-mance of the algorithm. The results show that the algorithm can effectively avoid multiple access to data objects to obtain efficient query processing for large-scale data with low false positive rate. Compared with NRA and BPA, the proposed algorithm cost is reduced significantly when the attribute list number are more than four. The proposed algorithm is suitable for large-scale data query processing.
出处
《江苏大学学报(自然科学版)》
EI
CAS
北大核心
2012年第5期555-560,共6页
Journal of Jiangsu University:Natural Science Edition
基金
国家自然科学基金资助项目(60773049)
江苏省研究生科研创新计划项目(CX07B-125z)
江苏省教育厅自然科学基金资助项目(11KJB520003)
江苏大学高级专业人才科研基金资助项目(09JDG035)