摘要
[目的/意义]面对海量专利文献,如何使用户快速、精准地掌握知识,是优化专利服务的关键课题。中文专利文献中大量存在的零形回指现象,严重影响了知识的自动识别与提取,但由于专利文献零形回指识别与消解涉及到众多文本分析技术及特定资源建设,因此,目前尚未发现针对性研究。[方法/过程]在物性结构理论、语义角色及修辞结构关系理论的指导下,展开相关规则的研究,开发句法及语义角色标注工具和篇章标注工具两种工具,并构建了4个资源库:①"专利动词物性角色库",将专利的动词归纳为4类;②"专利知识论元结构库",用于自动标注专利动词物性角色及其论元结构;③"专利动词论元结构规则库",用于分析零形回指的先行语;④"零形回指修辞结构类型库",用于分析当零形回指搭配"功能角色"和"部件角色"的情况。[结果/结论]通过资源库的建设,得出5条消解规则。初步成果已成功应用于机械领域专利文献的自动处理工作。
[ Purpose/significance] There is a huge number of patent documents, how users quickly and accurately grasp the knowledge is the key to optimize the patent service. The zero anaphora in Chinese patent literatures makes the automatic knowledge identification and extraction extremely difficult. Zero anaphora identification and resolution involves many technologies and particular resources, and there are still many problems unsolved. [ Method/process] Under the guidance of Qualia Structure theory, semantic roles and Rhetorical Structure theory, this paper finds some rules for zero anaphora resolution. It develops a syntax and semantic roles labeling tool and a text annotation tool. And it constructs 4 kinds of libraries: (1)"The library of qualia structure of patent verbs", in which the patent verbs are classified into 4 categories; (2)" the library of the knowledge of argument structures", which is used to label patent verbs and argument structures; (3)" the library of the rules of patent verbs argument structure ", which is used to analyze the antecedent of the zero anaphora; (4)" The library of the rhetorical structure of zero anaphora", which is used to analyze the situation when "telic role" and "constitutive role" appears. Through the construction of the libraries, 5 rules for zero anaphora resolution are constructed. [ Result/conclusion] Initial results have been successfully applied to automaticaUy processing patent literatures in the field of automatic processing work.
出处
《图书情报工作》
CSSCI
北大核心
2015年第9期73-79,142,共8页
Library and Information Service
基金
中国博士后科学基金"面向专业文献的汉语零形回指消解研究"(项目编号:2014M550792)
国家科技支撑计划课题"专利信息资源挖掘与发现关键技术研究"(项目编号:2013BAH21B02)研究成果之一
关键词
专利
零形回指
指代消解
物性结构
语义角色
修辞结构理论
patent zero anaphora anaphora resolution qualia structure semantic roles Rhetorical Structure Theory