摘要
为提高新药文献检索的效率,研发基于Hadoop的分布式新药文献检索系统。系统包括全文检索和化学结构式检索两大部分,其中全文检索基于关键技术Lucene和Hadoop实现;化学结构式检索则使用Hbase存储结构式的SMILES码和连接表,基于图同构算法VF2对结构式进行匹配。
In order to improve the efficiency of retrieving literature on new medicine,a distributed new medicine Uterature retrieval system is developed based on Hadoop.This system contains two parts:full- text retrieval and chemical structural formula retrieval.The former is implemented based on the key technologies of Lucene and Hadoop.The latter uses Hbase to store structured SMILES and connection tables and matches the structural formula based on graph isomorphism algorithm VF2.
出处
《医学信息学杂志》
CAS
2016年第5期73-78,共6页
Journal of Medical Informatics