摘要
针对当前中文专利文本实体关系抽取中采用词法特征、上下文特征、距离特征等传统特征导致抽取效率低的问题,提出一种将传统特征和句法语义特征相结合的方法。将中文专利文本的关系抽取问题转换为SAO结构的识别问题,进行分词和实体标注,抽取专利文本中的候选SAO三元组;提取候选SAO三元组的传统特征和句法语义特征;利用xg-boost算法在这些特征上做训练和预测,对特征的有效性进行实验分析。实验结果表明,该方法较使用传统特征的方法有明显提高,验证了句法语义特征的有效性。
To solve the problem that relation exaction from Chinese patent literatures uses traditional features such as word features, context features and distance features, leading to low extraction efficiency, a method combining traditional features with syntactic semantic features was proposed. Relation exaction from Chinese patent literatures was transferred into recognition problem of SAO structure. Word segmentation and entity tagging were used to extract the candidate SAO three tuple in the patent literatures. The traditional features and the syntactic semantic features were extracted in candidate three tuple. The xgboost was used to train these features and the efficiency of those features were analyzed. Experimental results show that the proposed method is more effective than methods using traditional features, and the validity of syntactic semantic features is verified.
作者
张永真
吕学强
申闫春
徐丽萍
ZHANG Yong-zhen;LYU Xue-qiang;SHEN Yan-chun;XU Li-ping(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Laboratory of VR and System Simulation,Beijing Information Science and Technology University,Beijing 100085,China;Beijing Research Center of Urban System Engineering,Beijing 100089,China)
出处
《计算机工程与设计》
北大核心
2019年第3期706-712,共7页
Computer Engineering and Design
基金
国家自然科学基金项目(61671070)
北京成像技术高精尖创新中心基金项目(BAICIT-2016003)
国家社会科学基金重大基金项目(15ZDB017)
国家语委重点基金项目(ZDI135-53)