期刊文献+

基于深度学习的中文专利自动分类方法研究 被引量:23

Research on the Method of Chinese Patent Automatic Classification Based on Deep Learning
原文传递
导出
摘要 [目的/意义]面向当前国内专利审查和专利情报分析工作中对于海量专利分类的客观需求,设计了7种基于深度学习的专利自动分类方法,对比各种方法的分类效果,从而助力专利分类效率和效果的提升。[方法/过程]针对传统机器学习方法存在的缺陷,基于Word2Vec、CNN、RNN、Attention机制等深度学习技术,考虑专利文本语序特征、上下文特征以及分类关键特征,设计Word2Vec+TextCNN、Word2Vec+GRU、Word2Vec+BiGRU、Word2Vec+BiGRU+TextCNN等7种深度学习模型,以中国专利为例,选取IPC主分类号的"部"作为分类依据,对比这7种模型与3种传统分类模型在中文专利分类任务中的效果。[结果/结论]实证研究效果显示,采用考虑语序特征、上下文特征及强化关键特征的深度学习方法进行中文专利分类具有更优的分类效果。 [Purpose/significance]In order to meet the needs of classifying massive patent automatically in current patent examination and patent information analysis work,this paper studies a series of patent automatic classification methods based on deep learning and compares the classification effects.This will promote the efficiency and effectiveness of patent classification.[Method/process]Aiming at the shortcoming of traditional machine learning methods,7 deep learning models was designed,including Word2Vec+TextCNN,Word2Vec+GRU,Word2Vec+BiGRU,Word2Vec+BiGRU+TextCNN and so on.These models based on the deep learning technology,such as Word2Vec,CNN,RNN,Attention mechanism and so on and considered the characteristics of patent text word order,context features and other key features in classification.Selecting the‘Section’of main International Patent Classification(IPC)was as the class labels,the study classified the Chinese patents by above 7 deep learning models and 3 traditional machine learning methods.And there was a comparison about the effect of classification in different models.[Result/conclusion]The empirical research indicated that it reached the better effect of Chinese patent classification by using deep learning methods which considered the characteristics of patent text word order,context features and other key features in classification.
作者 吕璐成 韩涛 周健 赵亚娟 Lyu Lucheng;Han Tao;Zhou Jian;Zhao Yajuan(National Science Library,Chinese Academy of Sciences,Beijing 100190;Department of Library,Information and Archives Management,School of Economics and Management,University of Chinese Academy of Sciences,Beijing 100190;Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190)
出处 《图书情报工作》 CSSCI 北大核心 2020年第10期75-85,共11页 Library and Information Service
基金 中国科学院青年人才项目"基于深度学习的专利所属产业分类"(项目编号:G180161001)研究成果之一。
关键词 专利自动分类 深度学习 词嵌入 专利文本挖掘 patent automatic classification deep learning word embedding patent text mining
  • 相关文献

参考文献8

二级参考文献86

  • 1李淑文.试论文本自动分类[J].现代计算机,2004,10(7):38-41. 被引量:2
  • 2顾益军,樊孝忠,王建华,汪涛,黄维金.中文停用词表的自动选取[J].北京理工大学学报,2005,25(4):337-340. 被引量:35
  • 3张玉芳,彭时名,吕佳.基于文本分类TFIDF方法的改进与应用[J].计算机工程,2006,32(19):76-78. 被引量:121
  • 4LiuBing.Web数据挖掘[M].北京:清华大学出版社,2009.
  • 5Vapnik V N.The Nature of Statistical Learning Theory[M].NY:Springer Verlag,1995
  • 6HE Cong, HAN Tong Loh. Grouping of TRIZ Inventive Principles to Facilitate Automatic Patent Classification [ J ]. Expert Systems with Applications, 2008, 34( 1 ) :788 -795.
  • 7WEBB Alan. TRIZ: An Inventive Approach to Invention [ J ]. Manufacturing Engineer, 2002, 81 (4) :171 - 177.
  • 8蔡小艳,寇应展.汉语词法分析系统ICTCLAS在Nutch中的应用与实现[J].军械程学院学报,2008,20(5):63-66.
  • 9VERHAEGEN P A. Relating Properties and Functions From Pa- tents to TRIZ Trends[ J]. CIRP Journal of Manuthcturing Science and Technology, 2009, 1 (3) :126 - 130.
  • 10LIANG Yanhong, TAN Runhua, MA Jianhong. Patent Analysis with Text Mining for TRIZ [ C ]//Proceedings of the 4th IEEE In- ternational Conference on Management of Innovation and Technol- ogy, ICMIT, 2008 : 1147 - 1151.

共引文献99

同被引文献304

引证文献23

二级引证文献83

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部