摘要
目的:为实现准确、高效的医药信息查询,本文探索了一种基于图结构的药物分子检索方法。方法:基于图结构的药物分子检索方法以接收智能终端的拍照或手绘作为输入,并将输入的结构式形式化为相应图结构,基于对图匹配效率的直接影响因素的分析,建立了结构式的一种紧凑有效的超图表示形式,其依据结构式的特点结合了子图匹配与频繁子图挖掘等方法对大图进行多级塌缩。为避免塌缩过程中子图交叠问题阻碍超图的准确构建,引入一种基于图同构的算法,借助子图之间交叠情况的分析,选择占优子图,利用多维度信息完成精确的分子匹配。结果:为证明检索方法的有效性,将本文检索方法和Wikipedia Chemical Structure Explorer(WCSE)进行检索准确率的对比,结果表明,本文方法的检索准确率更高,前10个检索结果的MAP(mean average precision)、DCG(discounted cumulative gain)、RBP(rank-biased precision)和ERR(expected reciprocal rank)四个指标均高于WCSE。上述指标的领先幅度分别为10%、1.41、6.42%、1.32%。进一步通过两个系统的具体检索结果实例对检索效果进行直观对比,发现本文方法在药物分子检索有效性方面更具优势,能为用户提供更为满意的检索结果。结论:本研究提出的基于图结构相似度的药物分子检索方法能够实现较为理想的检索结果,实验证明本检索系统具有可行性和有效性。
Objective:To establish a compact and efficient hypergraph representation and a graphsimilarity-based retrieval method of molecules to achieve effective and efficient medicine information retrieval.Methods:Chemical structural formula(CSF)was a primary search target as a unique and precise identifier for each compound at the molecular level in the research field of medicine information retrieval.To retrieve medicine information effectively and efficiently,a complete workflow of the graphbased CSF retrieval system was introduced.This system accepted the photos taken from smartphones and the sketches drawn on tablet personal computers as CSF inputs,and formalized the CSFs with the corresponding graphs.Then this paper proposed a compact and efficient hypergraph representation for molecules on the basis of analyzing factors that directly affected the efficiency of graph matching.According to the characteristics of CSFs,a hierarchical collapsing method combining graph isomorphism and frequent subgraph mining was adopted.There was yet a fundamental challenge,subgraph overlapping during the collapsing procedure,which hindered the method from establishing the correct compact hypergraph of an original CSF graph.Therefore,a graph-isomorphism-based algorithm was proposed to select dominant acyclic subgraphs on the basis of overlapping analysis.Finally,the spatial similarity among graphical CSFs was evaluated by multi-dimensional measures of similarity.Results:To evaluate the performance of the proposed method,the proposed system was firstly compared with Wikipedia Chemical Structure Explorer(WCSE),the state-of-the-art system that allowed CSF similarity searching within Wikipedia molecules dataset,on retrieval accuracy.The system achieved higher values on mean average precision,discounted cumulative gain,rank-biased precision,and expected reciprocal rank than WCSE from the top-2 to the top-10 retrieved results.Specifically,the system achieved 10%,1.41,6.42%,and 1.32%higher than WCSE on these metrics for top-10 retrieval results,respectively.Moreover,several retrieval cases were presented to intuitively compare with WCSE.The results of the above comparative study demonstrated that the proposed method outperformed the existing method with regard to accuracy and effectiveness.Conclusion:This paper proposes a graph-similarity-based retrieval approach for medicine information.To obtain satisfactory retrieval results,an isomorphism-based algorithm is proposed for dominant subgraph selection based on the subgraph overlapping analysis,as well as an effective and efficient hypergraph representation of molecules.Experiment results demonstrate the effectiveness of the proposed approach.
作者
瞿经纬
吕肖庆
刘振明
廖媛
孙鹏晖
王蓓
汤帜
QU Jing-wei;LV Xiao-qing;LIU Zhen-ming;LIAO Yuan;SUN Peng-hui;WANG Bei;TANG Zhi(Institute of Computer Science&Technology,Peking University,Beijing 100080,China;State Key Laboratory of Digital Publishing Technology,Beijing 100080,China;Department of Medical Chemistry,Peking University School of Pharmaceutical Sciences,Beijing 100191,China)
出处
《北京大学学报(医学版)》
CAS
CSCD
北大核心
2018年第2期368-374,共7页
Journal of Peking University:Health Sciences
基金
国家自然科学基金(61573028
61673029)
新闻出版业科技与标准重点实验室(新闻出版智能媒体技术重点实验室)
北京大学医学-信息科学交叉学科种子基金项目(BMU20160579)资助~~
关键词
信息存储和检索
分子结构
图结构
超图
算法
Information storage and retrieval
Molecular structure
Graph structure
Hypergraph
Algorithms