期刊文献+

离散化分段哈希的海量化合物并行检索

The parallel retrieval of massive compound based on discretized segment hash
原文传递
导出
摘要 针对海量数据环境下单机检索低效问题,建立了对海量化合物快速检索的分布式计算模型,提出了基于分治策略的分段哈希算法。对于如分子量、脂水分配系数(lggP)等不适于用哈希检索的连续数值型数据,设计了连续属性离散化模型进行离散化处理。实验结果表明,在对化合物大文件进行检索时,该模型可快速有效地检索范围信息,避免了对海量数据的重复检索,大幅降低了化合物检索的内存及时间,具有稳定的可扩展性和高效性。 Focusing on the problem of inefficient single retrieve in the environment of massive data, in this paper, a distributed computing model for fast retrieval of massive compounds is built, and a segment hash based on divided-and-conquer is proposed. In addition, aiming at some continuity properties which are not suitable for the hash retrieval such as molecular weight, lipid-water partition coefficient (logP) and so on, in this article a model ofdiscretization to process continuous attributes is designed. The experimental results show that when retrieving the large compound file, this method can retrieve a range of the information quickly and efficiently, avoid the repetition of retrieving massive data, and greatly reduce the memory and the time of the retrieve of compounds. Besides, the model is stably scalable and efficient.
出处 《计算机与应用化学》 CAS 2015年第7期885-888,共4页 Computers and Applied Chemistry
关键词 并行计算 化学信息学 海量数据 连续属性离散化 哈希 parallel computation chemoinformatics massive data discretization of continuous features hash
  • 相关文献

参考文献6

二级参考文献125

共引文献163

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部