摘要
采用关系数据库插件能够实现化合物描述符(FingerPrint)的生成、建立索引和化合物的子结构检索。本文以PubChem有机化合物Molfile为数据源,在Oracle关系数据库上分别安装插件OrChem(JAVA)和Bingo(C^(++))构建了化合物分子结构数据库。本文从FingerPrint的构成和索引策略两方面讨论了OrChem和Bingo的主要差异,并选取10个特征化合物进行子结构检索测试。对存储40万种化合物的分子结构数据库的测试结果显示,OrChem可满足用户检索响应,Bingo则更为快捷。对于存储2600万种化合物的分子结构数据库,针对Bingo通过优化Oracle数据库内存管理、数据表结构、子结构预筛选参数,实现了满足用户的高效检索。
Relational database cartridge provides FingerPrint generation,indexing and molecular substructure searching.Two Oracle database cartridges,OrChem(JAVA) and Bingo(C^(++)),are used in this work to develop the molecular structure database and its sub-structure search system with PubChem organic compound Molfile data.Differences between OrChem and Bingo on FingerPrint and Index strategy were discussed for 2-dimensional sub-structure searching.The efficiency of OrChem and Bingo on sub-structure searching was tested by using 10 typical query substructures on molecular structure databases of 400000 compounds and of 26000000 compounds,respectively.For the database of 400000 compounds,either OrChem or Bingo operates well enough for practical service,while Bingo shows higher efficiency.Bingo,for its capability on supporting the operation of larger database,operates successfully on the database of 26000000 compounds by configuration of memory management,table structure and sub-structure pre-screening parameters settings.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2011年第11期1419-1423,共5页
Computers and Applied Chemistry
基金
中国科学院信息化专项资助项目(INF-115-C01-SDB3-03)
关键词
分子结构数据库子结构检索
关系数据库插件
预筛选
molecular structure database
substructure searching
relational database cartridges
pre-screening