
GFL:用于族性化学结构的标引图形形式语言 被引量:1

GFL: A Graphic Formal Language for Markush Structure Indexing
摘要 为了满足日益增长的对专利检索的需求,国家高技术研究发展计划(863计划)启动了族性化学结构数据库系统的研究与开发。族性化学结构数据库系统主要涉及两方面的关键技术:(1)族性化学结构的计算机表达,(2)族性化学结构的检索算法。本文主要讨论族性化学结构的计算机表达。存在于化学专利原始文献中的族性化学结构是用具有一定规范的自然语言表述的。为了能在计算机系统中储存与检索这些信息,自然语言表述的族性化学结构必须转换为计算机可以接受的无歧义的形式语言。这个过程叫做族性化学结构的标引。国际上一般采用的基于结构片断的族性化学结构标引形式语言开发于20世纪70~80年代,这种形式语言与化学家采用的图形自然语言相去甚远,标引速度慢,成本高。本文介绍在ISIS/Draw绘图功能基础上发展起来的标引族性化学结构的图形形式语言,它的主要特点是与化学家日常使用的图形自然语言接近,规则简单易于掌握,从而提高标引效率,降低族性化学结构数据库系统的实现成本。 The State Intellectual Patent Office of P. R. C receives a huge amount of chemical patent applications each year Academies and enterprises have to search a large number of chemical patents in order to protect their own intellectual properties, and make use of known technology. In 2004, the National High Technology Research and Development Program of China initiated the project of generic chemical structure database as a solution to the chemical patent process challenges. Two core technologies of this project are: ( 1 ) Computer representation of generic chemical structure, (2) Retrieval algorithm of generic chemical structure. This article presents new protocols to represent generic chemical structures in a computer system. A generic chemical structure in a chemical patent is described in natural language, which is not well defined. Such natural language has to be formalized in order to be stored, exchanged, and searchable in a database system. The formalized language is called a formal language. An indexing process is to translate a chemical patent in natural language to the patent in a formal language. A number of formal languages for generic chemical structure have been reported in past years. Most of them are based upon the concept of chemical structure fragmentation. The main disadvantages of these languages are (1) syntaxes are too complicated to learn, (2) the rules are too different from natural chemical language, and hard to understand. These problems make the chemical patent indexing process very costly. In this paper, we propose a novel formal language to represent generic chemical structures, which are close to natural chemical language; syntax rules are concise and easy to learn. Therefore, the new formal language is well received in our chemical patent indexing process in SIPO (State Intellectual Property Office).
出处 《情报学报》 CSSCI 北大核心 2007年第2期253-259,共7页 Journal of the China Society for Scientific and Technical Information
基金 本项目由国家高技术研究发展经费资助(2003AA223603).
关键词 族性化学结构 马库什结构 标引 图形形式语言 计算机检索 generic chemical structure, Markush structure, indexing, graphic formal language, Markush database
  • 相关文献


  • 1Sibley J F.Too broad generic disclosures:a problem for all[J].Journal of Chemical Information and Computer Sciences,1991,31:5-9.
  • 2Bishop N,Gillet V J,Holliday J D,Willett P.Chemoinformatics research at the University of Sheffield:a history and citation analysis[J].Journal of Information Science,2003,29:249-267.
  • 3Barnard J.A comparison of different approaches to Markush structure handling[J].Journal of Chemical Information and Computer Sciences,1991,31:64-68.
  • 4Lynch M,Barnard J M,Welford S M.Computer storage and retrieval of generic chemical structures in patents:1.introduction and general strategy[J].Journal of Chemical Information and Computer Sciences,1981,21:148-150.
  • 5Barnard J M,Lynch M,Welford S M.Computer storage and retrieval of generic structures in chemical patents:4.an extended connection table representation for generic structures[J].Journal of Chemical Information and Computer Sciences,1982,22:160-164.
  • 6Fisanick W.The Chemical Abstracts Service generic chemical (Markush) structure storage and retrieval capability:1.basic concepts[J].Journal of Chemical Information and Computer Sciences,1990,30:145-154.
  • 7Nakayama T,Fujiwara Y.Computer representation of generic chemical structures by an extended block-cutpoint tree[J].Journal of Chemical Information and Computer Sciences,1983,23:80-87.
  • 8Kudo Y,Chihara H.Chemical substance retrieval system for searching generic representations:1.a prototype system for the Gazetted List of Existing Chemical Substances of Japan[J].Journal of Chemical Information and Computer Sciences,1983,23:109-117.


  • 1Sibley J F. Too broad generic disclosures: aproblem for all. J Chem Inf Comput Sci,1991,31:5-9.
  • 2Bishop N, Gillet V J,Holliday J D, et cd.Chemoinformatics research at the universityof Sheffield: a history and citation analysis.J Inf Sci, 2003 , 29:249- 267.
  • 3Lynch M,Barnard J M, Welford S M. Computer storage andretrieval of generic chemical structures in patents. 1. introductionand general strategy. J Chem Inf Comput Sci,1981,21:148-150.
  • 4Barnard J M, Lynch M, Welford S M. Computer storage andretrieval of generic structures in chemical patents. 4. an extendedconnection table representation for generic structures. J Chem InfComput Sci,1982, 22:160-164.
  • 5Barnard J M. A comparison of different approaches to Markushstructure handling. J Chem Inf Comput Sci, 1991, 31(1):64-68.









使用帮助 返回顶部