摘要
海量数据的高效表示和查找成为目前存储系统面临的重要挑战.针对存储系统中大规模动态数据集的表示和查找效率问题,提出一种多路平衡型矩阵Bloom Filter结构(M-BMBF)及其插入和查询算法.M-BMBF根据数据集合大小建立一个r×m矩阵型Bloom Filter,设计多个定位哈希函数将该矩阵Bloom Filter分为多组(多路)以实现平衡插入和高效查询操作.为减缓Bloom Filter中比特的消耗速度,使用一种"最长位匹配"填充算法,新元素的插入将从多路备选Bloom Filter中选择新置为1比特个数最少的Bloom Filter中进行.实验结果表明,相较典型拆分Bloom Filter,M-BMBF能在维持算法消耗时间为常量的基础上,有效节省存储空间,降低误判率.
Aiming at solving the representation and query efficiency in massive and dynamic dataset on storage system,a Multi-group Balance Matrix Bloom Filter(M-BMBF)and the algorithms on insertion and searching of data element were proposed.M-BMBF initiates a r×m matrix Bloom filter according to the size of dataset,and it introduces multiple located hash functions which can be used to divide the matrix Bloom filter into multi-group to achieve balanced insertion and efficient query operations.In order to slow down the bits consumption rate in Bloom filter when a new element is inserted,a longest-bit match filling algorithm was proposed,which selects a Bloom filter as the destination position for insertion from the candidate Bloom filters according to the rule that fewest bits will be changed due to this insertion operation.Experiment results show that compared with the classical Split Bloom Filter,M-BMBF can efficiently save storage space and decrease the misjudgment rate,while its time consume is constant.
作者
杨磊
黄建智
YANG Lei;HUANG Jianzhi(College of Information Science and Engineering, Hunan University, Changsha 410082, China)
出处
《湖南大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2018年第2期133-140,共8页
Journal of Hunan University:Natural Sciences
基金
国家重点研发计划"高性能计算"专项资助项目(2017YFB0202901)
湖南省自然科学基金资助项目(2015JJ2035)
中央高校基本科研业务费资助项目~~