期刊文献+

基于CRFs的专利文献领域术语抽取方法 被引量:11

Method of extracting patent domain terms based on conditional random fields
下载PDF
导出
摘要 通过对新能源汽车领域中文专利文献中术语特点的分析,提出利用条件随机场模型,分别基于三词位、四词位和六词位的字序列标注进行术语抽取的方法。以字为切分粒度,避免在术语抽取过程中因分词原因导致术语识别错误问题,并探讨不同词位标注集对术语抽取性能的影响。实验结果表明,基于六词位字标注的条件随机场模型术语抽取的性能最好,准确率、召回率和F值优于对比方法中基于词、词性、词长等信息作为特征的抽取方法,验证了所提方法的有效性。 After analyzing the features of terms in the Chinese patent documents about new energy vehicles,an optimization method that used the conditional random fields model to extract the terminologies based on the word sequence of three,four and six word tagging was proposed.Single character was used as the shard granularity and the recognition error caused by word segmentation in term extraction was avoided.The extraction performances on different word level tagging sets were discussed.Experimental results show that the condition of the six word tagging is the best in conditional random fields model,and the accuracy rate,recall rate and F values are better than contrast method using word,word POS,word length and other information as features to extract terms,thus verifying the effectiveness.
作者 王健 殷旭 吕学强 徐丽萍 WANG Jian;YIN Xu;LYU Xue-qiang;XU Li-ping(Beijing Key Laboratory of Internet Culture and Digital Dissemination Research,Beijing Information Science and Technology University,Beijing 100101,China;Beijing Research Center of Urban System Engineering,Beijing 100089,China)
出处 《计算机工程与设计》 北大核心 2019年第1期279-284,共6页 Computer Engineering and Design
基金 国家自然科学基金项目(61671070) 北京成像技术高精尖创新中心基金项目(BAICIT-2016003) 国家社会科学基金重大基金项目(14@ZH036) 国家语委重点基金项目(ZDI135-53) 国家语委重大课题基金项目(ZDA125-26)
关键词 中文专利术语 术语抽取 条件随机场 序列标注 新能源汽车领域 Chinese patent terminology term extraction CRFs sequence labeling new energy vehicles
  • 相关文献

参考文献4

二级参考文献46

  • 1葛煦,卢宝华,杨湘华.谈高校科技发展中专利文献的利用[J].技术与创新管理,2005,26(1):68-70. 被引量:6
  • 2张锋,许云,侯艳,樊孝忠.基于互信息的中文术语抽取系统[J].计算机应用研究,2005,22(5):72-73. 被引量:36
  • 3何婷婷,张勇.基于质子串分解的中文术语自动抽取[J].计算机工程,2006,32(23):188-190. 被引量:21
  • 4ZHENG D Q, ZHAO T J, YANG J. Research on domain term extraction based on conditional random fields [C] // ICCPOL 2009, LNAI 5459. Berlin: Springer-Verlag, 2009 : 290-296.
  • 5JI L, SUM M, LU Q, etal. Chinese terminology extraction using window-based contextual information [ C ]// CICLing 21107, LNCS 4394. Berlin : Springer-Verlag, 2007 : 62-74.
  • 6YANG Y H, LU Q, ZHAO T J. Chinese term extraction using minimal resources [ C ] // Proceedings of the 22nd International Conference on Computational Linguistics. Manchester: [ s n ], 2008:1033-1040.
  • 7王昊,邓三鸿.HMM和CRFs在信息抽取应用中的比较研究[J].现代图书情报技术,2007(12):57-63. 被引量:12
  • 8ZHENG D Q, ZHAO T J,YANG J. Technical term automaticextraction research based on statistics and rule [C]// ICCPOL 2009,LNAI 5459. Berlin: Springer-Verlag, 2009: 290-296.
  • 9LAFFERTY J, MCCALLUM A’ PEREIRA F. Conditional RandomFields: Probabilistic Models for Segmenting and Labeling SequenceData [C]// Proceedings of 18th International Conference on MachineLearning. San Francisco, USA: AAAI Press,2001: 282-289.
  • 10Peng Fuchun, McCallum A. Accurate information extraction fromresearch papers using conditional random fields [J]. Informationprocessing and management, 2006,42(4): 963-979.

共引文献38

同被引文献206

引证文献11

二级引证文献54

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部