摘要
在化学及相关研究中,常常需要根据化合物的CAS登记号查询其结构。本文通过整合山东省生物信息工程技术研究中心现有的数据,创建成一个化合物CAS登记号与其结构相互对应的数据库,应用于科研。先从CMC、MDDR、ACD、CNPD、NCI等7个数据库中,共导出575468个化合物的相关数据,这些数据经处理后,导入ChemFinder化学数据库系统。通过查重,保留了404269个独立CAS登记号的化合物,每个化合物都包含其结构、CAS登记号、来源数据库及编号、分子式、分子量、脂水分布系数等信息。数据库还保留了sdf和mol2两种文件格式,以满足虚拟筛选等后续研究的需要。
CAS register number is an exclusive serial number supposed by Chemical Abstracts Service for chemical compounds. It is a useful tool to search chemical compounds with two or more names. In this paper, a database with structures and corresponding CAS register numbers was established by merging several databases localized in Shandong Provincial Research Center for Bioinformatics Engineering and Technique, namely CMC, MDDR, ACD, CNPD, NCI and ChemIndex. 575468 compounds with their corresponding CAS register numbers were exported and processed. Then, the data were imported into ChemFinder system and the duplicate ones were removed. The final database contains 404269 compounds with their CAS register numbers, original database, molecular formula and other properties. Furthermore, another two file formats (mol2 and sdf) were reserved for in house virtual screening.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2008年第11期1432-1434,共3页
Computers and Applied Chemistry
基金
山东理工大学博士科研启动基金(4041-405019)
山东理工大学生命科学学院特色学科建设项目的资助。