摘要
针对RDF数据的存储问题,文章提出一种利用分布式数据库HBase以及设计数据库的Rowkey进行有效存储的方案,主要利用HBase与RDF数据的特点,使用经典的BKDRHash算法对谓词进行散列计算,将散列值与谓词作为主键,实现数据的存储。有效设置HBase的Rowkey不仅避免出现节点堆积现象,BKDRHash算法的使用也保证了数据的完整性。为证明该存储模式的有效性,文章实验选择利用Map Reduce将数据生成HBase内部存储格式HFile文件进行并行加载。实验证明,针对这样的存储模式,当数据量很大时数据加载性能较好。使用LUBM测试集进行仿真实验,证明该方案是有效的。
Aiming at the storage of RDF data,this paper proposes an effective storage scheme based on the Rowkey and the distributed database HBase,which mainly use of the characteristics of HBase and RDF data.The method uses the classic BKDRHash algorithm to hash the predicate,and looks the hash value and the predicate as the primary key to enforce the data storage.Effective setting the Rowkey of HBase not only avoid the phenomenon of node accumulation,the use of BKDRHash algorithm also ensures the integrity of the data.In order to prove the validity of this storage mode,the experiment is to use Map Reduce to load the data into HBase in parallel ways with the internal storage format HFile file.Experiments show that,for such a storage model,when the data quantity is large,the data loading can achieve better performance.The paper mainly uses the LUBM test set to carry on the simulation experiment,and it proves that the scheme is effective.
出处
《信息网络安全》
2016年第3期59-63,共5页
Netinfo Security
基金
黔科合JZ字[2014]2001