期刊文献+

融合表示学习与机器学习的专利科学引文标题自动抽取研究 被引量:1

Extracting Titles from Scientific References in Patents with Fusion of Representation Learning and Machine Learning
原文传递
导出
摘要 【目的】针对专利引文类别繁多的问题,研究自动识别其中专利科学引文这一特定类别的方法,进而准确抽取专利科学引文的标题特征项,支撑后续专利科学引文深度分析与挖掘。【方法】引入表示学习方法 Doc2Vec实现专利科学引文整体的语义向量表示,结合机器学习分类方法实现专利科学引文这一特定类别的识别;在此基础上,利用表示学习方法实现专利科学引文标题等内容元数据的语义向量表示,结合机器学习分类方法抽取专利科学引文标题。【结果】在基因领域专利的实验中,专利科学引文的识别精确率达到99.27%,专利科学引文标题抽取精确率达到92.59%,抽取精确率较单纯的机器学习方法提高5.96%。【局限】人工标注训练集较为耗时;对实验数据格式有一定要求。【结论】本文方法在专利科学引文识别和标题抽取上具有良好效果。 [Objective] This paper aims to automatically identify scientific references in patent(SRP), and then extract titles from SRP to support in-depth data mining.[Methods] Firstly, we used the Doc2Vec method to generate vectors for the patent citations. Then, we identified the SRPs with support vector machine(SVM). Third, we created vectors for the metadata(such as titles) of SRP, and extracted titles with SVM.[Results] We examined the proposed method with patent citations from the genetic field. The accuracy of SRP recognition and titles extraction reached 99.27% and 92.59% respectively. The latter was 5.96% higher than those of the traditional methods.[Limitations] Manually tagging the training set was very time consuming, and there are format requirements for the experimental data.[Conclusions] The proposed method could effectively identify and extract patent citations and titles.
作者 张金柱 胡一鸣 Zhang Jinzhu;Hu Yiming(School of Economics and Management,Nanjing University of Science and Technology,Nanjing 210094,China)
出处 《数据分析与知识发现》 CSSCI CSCD 北大核心 2019年第5期68-76,共9页 Data Analysis and Knowledge Discovery
基金 国家自然科学基金青年项目“基于被引科学知识突变的突破性创新动态识别及其形成机理研究”(项目编号:71503125) 国家重点研发计划子课题“知识产权大数据挖掘技术、智能推送技术及应用示范”(项目编号:2017YFB1401903) 江苏省社会科学基金青年项目“基于社团结构动态演化的主题突变监测与形成机制研究”(项目编号:17TQC003) 中央高校基本科研业务费专项资金“基于表示学习的专利科学引文元数据抽取研究”(项目编号:30918013108)的研究成果之一
关键词 专利科学引文 元数据抽取 机器学习 表示学习 Scientific References in Patent Metadata Extraction Machine Learning Representation Learning
  • 相关文献

参考文献6

二级参考文献61

  • 1屠彤辉.期刊论文的元数据描述探析[J].上海高校图书情报工作研究,2006,16(4):30-34. 被引量:3
  • 2李姜.基于DOM的评论发现及抽取模型研究[J].计算机工程与设计,2007,28(9):2150-2153. 被引量:5
  • 3Rip A. Science and technology as dancing partners [ A]. Kroes P, Bakker M. Technology Development and Science in the Industrial Age [ C ]. Kluwer Academic Publishers, 1992. 231 -270.
  • 4路甬祥.全面认识科学技术的价值及时代特征.新华文摘,2005,22:118-120.
  • 5Bhattacharya S, Meyer M. Large firms and the science -technology interface patents, patent citations, and scientific output of multinational corporations in thin films [J]. Scientometrics, 2003, 58(2): 265-279.
  • 6Meyer M. Measuring science - technology interaction in the knowledge - driven economy: the case of a small economy [ J]. Scientometrics, 2006, 66 (2): 425- 439.
  • 7Godin B. The Relationship between Science and Technology: a Bibliometric Analysis of Papers and Patents in Innovative Firms [D]. unpublished D. Phil. thesis. University of Sussex, 1993.
  • 8Godin B. Research and the practice of publication in industries [ J]. Research Policy, 1995, 25 : 587 - 606.
  • 9Van Looy B, Debackere K, Callaert J, et al. Scientific capabilities and technological performance of national innovation systems: an exploration of emerging industrial relevant[ J]. Scientometrics, 2006, 66 (2): 295- 310.
  • 10Glanzel W, Meyer M. Patents cited in the scientific literature: an exploratory study of 'reverse' citation relations [J]. Scientometrics, 2003, 58 (2) : 415 -428.

共引文献56

同被引文献21

引证文献1

二级引证文献11

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部