摘要
作为高通量筛选的一种有效方法,虚拟筛选得到了越来越广泛的应用。当靶分子结构未知时,往往使用基于配体的虚拟筛选方法。在基于配体的虚拟筛选方法中,相似性方法起着非常重要的作用。基于中药有效成分化合物数据库,进行了层次凝聚聚类分析。在化学信息系统中,有许多的距离/相似性度量方法和相似性系数。在化学结构的表示和特征选择方面,使用了广泛使用的Daylight分子指纹。采用CDK项目来计算基于Daylight分子指纹的Tanimoto系数作为分子相似性度量方法。对TCM数据库进行了层次凝聚聚类分析,并在聚类之前应用了化学结构领域知识来进行待聚类数据的预处理。在层次聚类时,设定了0.75作为聚类的相似度阈值。计算了层次聚类过程中Kelly方法中的惩罚值来获取最合适的簇数量,通过该方法得到的簇数量与采用0.75作为相似度阈值聚类得到的簇数量非常接近。针对每一个包含多个化合物的簇,选取了多个化合物作为该簇的代表性化合物。同时根据聚类结果分析了Tanimoto系数的缺点。在后续工作中,可对TCM数据库进行分子骨架分析和多样性分析,并基于分子骨架进行聚类。
Virtual screening is increasingly used as a cost-effective complement to high-throughput screening. And similarity methods play a key role in the ligand-based virtual screening approaches while the macromolecule structural information is unavailable. The Traditional Chinese Medicine Database was used to conduct hierarchical agglomerative clustering of effective compounds contained in TCM. There are many distance metrics and similarity coefficients commonly used in chemical information systems. In this paper, Daylight fingerprint was adopted as chemical structural representation method. And similarity indexes were calculated according to Tanimoto coefficient defmition using the famous chemical library project-Chemical Development Kit (CDK). The hierarchical agglomerative clustering algorithm was implemented and conducted with the TCM database. And domain-specific knowledge was used to preprocess the molecules data in TCM database. The similarity threshold value of 0.75 was used in hierarchical agglomerative clustering of TCM database. The penalty value of Kelly method was calculated to get the optimal clusters number. And the clusters number calculated from Kelly method is very close to the clusters number resulted from hierarchical clustering using the threshold value of 0.75. Multiple representative molecules were calculated and selected from each non-singleton cluster. And the bias of Tanimoto coefficient was also analyzed. The scaffold analysis and scaffold-based clustering can be done in the future work.
出处
《计算机与应用化学》
CAS
CSCD
北大核心
2013年第6期575-581,共7页
Computers and Applied Chemistry
基金
supported by the National Natural Science Foundation of China(40672104)
supported by Beijing Municipal Education Commission Scientific & Technological Development Plan Foundation(KM201211417002)
the Importation and Development of High-Caliber Talents Project of Beijing Municipal Institutions(CIT&TCD 201304090)
Funding Project for Academic Human Resources Development in Beijing Union University(BPHR2011A04,BPHR2012F01)~~
关键词
层次聚类
TCM
分子指纹
虚拟筛选
Ward方法
Tanimoto系数
hierarchical clustering, traditional Chinese medicine, molecular fingerprint, virtual screening, ward' method, tanimoto coefficient